Hypothesis-driven active learning over the chemical space

ORAL

Abstract

From applications in identifying potential drug targets to designing electronics, catalysts, photovoltaics and chemical reactions, efforts to discover molecular candidates has risen steeply over the years. The rapid exploration of chemical space targeting desired functionalities is performed by high-throughput screening combined with computational simulations and synthesis. Here, we introduce a novel approach for active learning of a wide chemical space based on hypothesis learning. The study is conducted on ~130,000 molecules present in the QM9 dataset to actively learn about formation enthalpy of all molecules. We construct multiple hypotheses based on the possible relationships between structures and functionalities of interest and introduce these as mean functions for Gaussian Process. This approach then combines the elements from the symbolic regression methods such as SISSO and Bayesian Optimization in a single framework. Although demonstrated for the QM9 dataset, this method is expected to be universally applicable for other datasets containing information on molecules to solid-state materials.

*Acknowledgements: This effort (machine learning) is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences Data, Artificial Intelligence and Machine Learning at DOE Scientific User Facilities.

Presenters

  • Ayana Ghosh

    • Oak Ridge National Lab

Authors

  • Ayana Ghosh

    • Oak Ridge National Lab
  • Sergei V Kalinin

    • University of Tennessee
    • University of Tennessee, Knoxville
  • Maxim Ziatdinov

    • Oak Ridge National Lab