Developing Databases for Polymer Informatics
ORAL
Abstract
One significant barrier to the adoption of polymer informatics is a lack of large FAIR (Findable, Accessible, Interoperable, Reusable) databases. In an effort to overcome this barrier, we developed pipelines to harness the vast quantities of valuable experimental polymer data trapped in the literature. In our first effort, we developed the largest Flory-Huggins chi parameter database using crowdsourcing and found that the burden to review papers could be lessened by training a classifier to identify promising articles. To further reduce human input, we turned to natural language processing software coupled with specially designed software modules to extract grass transition temperatures with minimal human input; ultimately, we extracted over 250 glass transition temperatures. All of the resulting data is freely available at the Polymer Property Predictor and Database website (http://pppdb.uchicago.edu). During this process, we found that identification of the polymer names within the literature was a key problem as polymers are referred to by common names, sample names, labels, etc. and subsequently explored named entity recognition to tackle this problem. To further extend our databases, we are working on allowing them to accept user submitted data.
–
Presenters
-
Debra Audus
- National Institute of Standards and Technology
- National Institute of Standards and Technology, Gaithersburg, MD