Impact of Dataset Uncertainties on Machine Learning Model predictions: The Example of Polymer Glass Transition Temperatures

ORAL

Abstract

Data-driven methods are seeing a revival and are deeply influencing multiple aspects of materials research. Materials property data from computations or experiments, are being utilized to create surrogate models using machine learning (ML) techniques. These models can be utilized to provide rapid predictions of the properties of new materials at a fraction of the cost compared to actual experimentation or computation. Moreover, a variety of techniques are being explored to “invert” the property prediction pipeline to allow for designing materials with desired target set of property values. The quality of the developed surrogate model, depends on the quality (and quantity) of the dataset used in the model training. Often, different experimental studies may report different values for the same property of the same material. This may be due to variations in measurement techniques, conditions, and sample quality among others. How should one treat these variances and what is their impact ? This question needs to be answered specifically, since it is paramount to the development of a good prediction model and helps understand its limitations.

*The authors acknowledge support of this work by the Toyota Research Institute through the Accelerated Materials Design and Discovery program.

Presenters

  • Anurag Jha

    • Georgia Institute of Technology

Authors

  • Anurag Jha

    • Georgia Institute of Technology
  • Anand Chandrasekaran

    • Georgia Institute of Technology
  • Chiho Kim

    • Georgia Institute of Technology
  • Ramamurthy Ramprasad

    • Georgia Institute of Technology
    • University of Connecticut
    • School of Materials Science and Engineering, Georgia Institute of Technology
    • Materials Science and Engineering, Georgia Institute of Technology
    • School of Materials Science and Engineering, Georgia Institute of Techmology