GPU-Acceleration of the ELPA2 Distributed Eigensolver for Applications in Electronic Structure Theory

ORAL

Abstract

The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of electronic structure theory. For large systems, these eigenproblems can easily exceed the capacity of a single computer, thus must be solved on distributed-memory parallel computers. The ELSI library facilitates large-scale electronic structure calculations by providing a unified interface to various fast and scalable eigensolvers and density matrix solvers, including the EigenExa, ELPA, libOMM, NTPoly, PEXSI, and SLEPc libraries. The ubiquitous adoption of hybrid CPU-GPU nodes in supercomputing opens up new opportunities to accelerate electronic structure calculations. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of its existing cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which was known as the main bottleneck of the two-stage tridiagonalization algorithm. CPU, GPU, and MPI activities are overlapped wherever possible. Robust choices that maximize the GPU compute intensity are identified. We demonstrate the performance of this GPU-accelerated eigensolver by a set of benchmark calculations.

*This work is supported by NSF under grant number 1450280.

Presenters

  • Victor Yu

    • Duke University
    • Department of Mechanical Engineering and Materials Science, Duke University

Authors

  • Victor Yu

    • Duke University
    • Department of Mechanical Engineering and Materials Science, Duke University
  • Jonathan Moussa

    • Molecular Sciences Software Institute
    • The Molecular Sciences Software Institute
  • Volker Blum

    • Department of Mechanical Engineering and Materials Science, Duke University
    • Duke University
    • Mechanical Engineering and Material Sciences; Chemistry, Duke University