The traditional GW-Bethe-Salpeter (BSE) approach has, in practice, been prohibitively expensive on systems with more than 50 atoms. We show that through a combination of methodological and algorithmic improvements, the standard GW-BSE approach can be applied to systems with hundreds of atoms. We will discuss the massively parallel GW-BSE implementation in the BerkeleyGW package (on-top of common DFT packages) including the importance of hybrid MPI-OpenMP parallelism, parallel IO and library performance. We will discuss optimization strategies for and performance on many-core architectures.
*Support for this work is provided through Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences. Grant Number DE-FG02-12ER4
–
Authors
Jack Deslippe
Lawrence Berkeley National Lab
Andrew Canning
Lawrence Berkeley National Lab
Computational Research Division, Lawrence Berkeley National Laboratory
Yousef Saad
University of Minnesota
James R. Chelikowsky
University of Texas at Austin
Institute of Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX
Steven.G. Louie
Department of Physics, University of California at Berkeley and Materials Sciences Division, Lawrence Berkeley National Laboratory
UC Berkeley physics/ LBNL MSD
Dept. of Physics UC Berkeley and Lawrence Berkeley National Lab
University of California - Berkeley, Lawrence Berkeley National Laboratory
Physics Department, UC Berkeley and Lawrence Berkeley National Lab
University of California at Berkeley
University of California, Berkeley
UC Berkeley and Lawrence Berkeley National Laboratory
Univ of California - Berkeley
Dept. of Physics, University of California, Berkeley and Materials Science Division, Lawrence Berkeley National Laboratory