Enhanced Machine Learning Models for Structure-Property Mapping with Principal Covariates Regression
ORAL
Abstract
Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we introduce a kernelized version of PCovR and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science. Additionally, we demonstrate the improved performance resulting from incorporating PCovR into two popular data selection methodologies, CUR and Farthest Point Sampling, which iteratively identify the most diverse samples and discriminating features.
*MC, RKC, BAH acknowledge funding by the ERC Horizon 2020 grant agreement no. 677013-HBMAP. GF acknowledges support by the SCCER Efficiency of Industrial Processes, and by the Eur. Center of Excellence MaX, Materials at the Hexascale - GA No. 676598. EE acknowledges support from Trinity College, Cambridge and the Swiss National Computing Centre.
–
Presenters
-
Rose K. Cersonsky
- Ecole Polytechnique Federale de Lausanne