Unbiasing machine learning for molecular dynamics: emphasising out-of-equilibrium geometries using clustering
ORAL
Abstract
Machine learning (ML) force-fields (FF) became an increasingly popular tool in computational physics due to their speed and accuracy. By construction, ML models are often biased towards more abundant "close-to-equilibrium" states. A small mean error does not guarantee accurate prediction for rare "out-of-equilibrium" configurations, which are typically underrepresented in reference datasets.
We propose a method to train unbiased ML FF, which leads to equally accurate predictions independently of the density of training data. To achieve this, we divide datasets into smaller subsets (clusters) based on data similarities. Then, the quality of a ML model is evaluated for each individual cluster, thereby revealing problematic cases. Representative data for each problematic cluster is added to the training set, and the ML model is retrained. The improved learning process results in a flattening of the prediction errors throughout the reference data. The method is applied to molecular trajectory datasets, decreasing the largest errors of the obtained ML FF up to an order of magnitude.
We propose a method to train unbiased ML FF, which leads to equally accurate predictions independently of the density of training data. To achieve this, we divide datasets into smaller subsets (clusters) based on data similarities. Then, the quality of a ML model is evaluated for each individual cluster, thereby revealing problematic cases. Representative data for each problematic cluster is added to the training set, and the ML model is retrained. The improved learning process results in a flattening of the prediction errors throughout the reference data. The method is applied to molecular trajectory datasets, decreasing the largest errors of the obtained ML FF up to an order of magnitude.
–
Presenters
-
Grégory Cordeiro Fonseca
- University of Luxembourg