SaSHiMi - Sparse Direct Solver using Hierarchical Matrices

Nowadays, the number of computational cores in supercomputers has grown largely to a few millions. However, this trend in the number of computational units is not followed by the respective increase in the amount of memory available, and the memory per core ratio is quickly decreasing with the advent of accelerators. To face this problem, applications must not only be well parallelized, but they also need to reduce their memory consumption. Thus, work must be done at all levels of applications to fit those new needs. In scientific and engineering simulations often arise the need to solve large sparse systems of linear equations which is a crucial and time-consuming step. To tackle this problem, work has been started in recent years on matrix compression techniques to reduce the memory and computation complexities. It can be sorted in three main families: the flat format (BLR), the hierarchical non-nested formats (H and HODLR), and the hierarchical nested formats (H^2 and HSS) which present various levels of compression and are more or less complex to implement. This has been widely studied in the dense case, but work on sparse matrices is more recent and it has mainly been studied for the class of multifrontal solvers to reduce the memory overhead of the so-called frontal matrices as well as time to solution. The SaSHiMi project will especially investigate supernodal approaches. We believe this method will provide a larger parallelism, and is the only solution to efficiently reduce the memory peak of the solver as it does not have the drawback of handling frontal matrices.

The project will continue preliminary work on low-rank matrices by moving toward more complex hierarchical matrix compression format which offers larger compression rates, reducing memory and flops costs. The research will be made in the PaStiX library which exploits runtime systems from which the irregular kernels of the hierarchical format will benefit. The solver will be studied in both shared and distributed memory contexts, for which data mapping and scheduling will be adapted. The final result will be compared to two main competitors in terms of accuracy, time, and memory.

The main difficulty of the hierarchical format will reside in the complexity of performing updates. To ease those updates and increase their efficiency, new ordering heuristics will be studied to fit the requirements of such matrix representations. Those ordering techniques will operate at two levels: directly in the nested dissection algorithm to modify the choice of the separators in order to align them as much as possible, and on the algorithm used to reorder the unknowns. This should reduce the number of updates as well as the number of compression attempts. Not only the project will benefit from this task, but also all sparse direct solvers with full or low-rank capabilities.

We expect the outcome of the project to be of great interest to the scientific computing community and to researchers from many other fields. In this purpose, all the results will be developed in open source libraries, PaStiX for the solver and Scotch for the ordering heuristics.

Finally, this project will target two kinds of applications to validate the choices made and to open new opportunities of research. The first one will study the impact of low-rank solver onto hybrid solvers, and the second one will study the impact on applications from the plasma-fusion simulation program of the ITER project.