A massive dataset of synthesis-friendly hypothetical polymers
ORAL
Abstract
Polymer informatics is an emerging field in materials science. It aims to build data-driven models to instantaneously predict the properties of polymers, and use this capability to screen a large candidate set of polymers to identify promising ones based on their predicted properties. However, it is important for this candidate set to include synthesizable polymers. By utilizing ~13k experimentally known polymers, we identified two distinct pathways to generate a dataset of synthesis-friendly hypothetical polymers. These pathways comprise a combinatorial assembly of retrosynthetic fragments obtained from the ~13k polymers, and a framework that treats polymers are graphs followed by graph-to-graph translations. This has resulted in a massive dataset of 100 million hypothetical but synthesis-friendly polymers. Additionally, we quantify the synthetic feasibility of each polymer as a score and demonstrate that a large portion of the generated polymers are synthesis-ready. This massive database can be used (1) for direct screening purposes using available property prediction models, and (2) within unsupervised approaches to train of generative models to enable and accelerate polymer discovery.
*Office of Naval Research MURI (N00014-17-1-2656) and regular (N00014-20-2175) grants.
–
Presenters
-
Arunkumar Rajan
- Georgia Institute of Technology