OMDB-GAP1: A new dataset for band gap predictions for large organic crystal structures
ORAL
Abstract
Large datasets of ab initio calculations have enabled many pioneering studies of machine learning applied to quantum-chemical systems. For example, the machine learning models achieved chemical accuracy on the popular QM9 dataset which contains small organic molecules. Here, we present a new, more challenging dataset of 12,500 large organic crystal structures and their corresponding DFT band gap, freely available at https://omdb.diracmaterials.org/dataset. The dataset is based on the Organic Materials Database (OMDB) which hosts electronic properties of previously synthesized organic crystal structures. With an average of 85 atoms per unit cell, this dataset provides a new challenge for machine learning applications. We also evaluate the performance of two recent machine learning models on this new dataset: Kernel Ridge Regression with the Smooth Overlap of Atomic Positions (SOAP) and the deep learning model SchNet.
*the Swedish Research Council (638-2013-9243), the Knut and Alice Wallenberg Foundation, the European Research Council (DM-321031), the Marie Sklodowska-Curie grant agreement no.~713683 (COFUNDfellowsDTU), Swedish National Infrastructure for Computing (SNIC) at the Center for High Performance Computing (PDC) and the High Performance Computing Center North (HPC2N)
–
Presenters
-
Bart Olsthoorn
- NORDITA