This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, March 15 • 4:20pm - 4:40pm
Algorithms and Performance: High Performance Low Rank Schur Complement for the Helmholtz Equation

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Solving the Helmholtz equation represents an important challenge in large-scale 3D seismic applications. A sparse direct solvers is a method of choice in presence of many right-hand sides on a large domain for a selected set of frequencies. In particular, multifrontal (i.e., MUMPS/SuperLU) and supernodal (i.e., PaStiX) solvers exhibit formally dense Schur complements on the last root separators and the first level blocks, respectively, obtained from nested dissection graph partitioning. These dense Schur complements turn out to operate on data-sparse, low rank structured matrix blocks. Data compression (through SVD, RSVD, RRQR, ACA, etc.) can then occur by means of approximating each underlying tile with a given accuracy threshold. After truncation, the resulting low rank tile data structure corresponds to an outer product of two tall-and-skinny matrices of width k, with k being the rank of the compressed block. The Schur complement computation needs now to take into account the new data structure. There have been many recent works on exploiting the low rankness in direct sparse solvers and preconditioning [Engquist and Ying 2011, Kriemann 2013, Xia 2013, Ambikasaran 2013, Aminfar et. al 2014, Amestoy et. al 2015, etc.].
We design and implement a new efficient Schur complement on massively parallel hardware architectures, such as Intel KNLs and NVIDIA Pascal GPUs. The main idea is to refactor the the Schur complement code by moving from a tile-centric to a kernel-centric variant of the code in order to expose batch kernel executions. The low arithmetic intensity of the numerical kernels (due to very small rank sizes) and the resulting latency overhead can be compensated by increasing the occupancy on the system. This requires the extension of the standard BLAS kernels into batch execution mode as well as variable block sizes to handle the rank size heterogeneity. This new BLAS kernel collection is being aggressively investigated by the community and industrial vendors from an API and performance point of view.
We present the HiCMA library, which performs hierarchical computations on manycore architectures using low rank tile-structured matrix. HiCMA relies on the KBLAS library for performance, which implements highly tuned batch BLAS kernels on NVIDIA GPUs on very small, variable rank sizes. On Intel architecture, HiCMA relies on OpenMP to batch sequential MKL kernels across the processing units. We describe the Schur complement computation for the Helmholtz equation within these aforementioned frameworks. We report the resulting memory footprint, arithmetic complexity, and performance on various hardware architectures and compare against state-of-the-art dense numerical libraries.


Henri Calandra

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →
avatar for Ernesto Prudencio

Ernesto Prudencio

Senior Software Engineer, Schlumberger
Ernesto E. Prudencio combines a BSc in electronics engineering (1990, Brazil), a MSc in applied mathematics (domain decomposition methods, 2001, Brazil), and a PhD in computer science / numerical analysis (PDE-constrained optimization, 2005, Boulder, CO), with professional experience in industry (IBM, Integris, Schlumberger), in national laboratories (ANL, SLAC) and academia (UT Austin). He has been working in Schlumberger since September of... Read More →


Hatem Ltaief

Senior Research Scientist, KAUST
High performance computing | Numerical linear algebra | Performance optimization

Wednesday March 15, 2017 4:20pm - 4:40pm
Room 280

Attendees (2)