GPU-Acceleration of the ELPA2 Distributed Eigensolver for Applications in Electronic Structure Theory
ORAL
Abstract
The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of electronic structure theory. For large systems, these eigenproblems can easily exceed the capacity of a single computer and thus must be solved on distributed-memory parallel computers. The ELSI library facilitates large-scale electronic structure calculations by providing a unified interface to various fast and scalable eigensolvers and density matrix solvers. The ubiquitous adoption of hybrid CPU-GPU nodes in supercomputing opens up new opportunities to accelerate electronic structure calculations. We here present our (NVIDIA) GPU-oriented development of the ELPA two-stage tridiagonalization eigensolver (ELPA2), including GPU offloading based on the cuBLAS library, and CUDA kernels to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. Robust choices that maximize GPU performance are identified. We demonstrate the performance of this GPU-accelerated eigensolver by a set of benchmark calculations on the Summit supercomputer. This work is supported by NSF under Award No. 1547580 and Award No. 1450280.
–
Presenters
-
Victor Yu
- Duke University, USA