Automated Knowledge Graph Generation from Text for Synthesis of Energetic Materials
ORAL
Abstract
Within the past two decades machine-learning algorithms have seen diverse development and implementation in a variety of domains, including those related to shock compression. These developments include advances in computationally assisted synthesis planning and natural language processing for text documents in the context of chemical energy. The objective of this work is to explore the intersection of these emergent research capabilities and develop automatable approaches for extracting synthesis information for chemical storage from text documents to create novel representations via knowledge graphs. Knowledge graphs are composed of nodes and edges, wherein the nodes represent entities, such as chemical compounds, and the edges represent the relations between the entities, perhaps indicating solubility. The knowledge graph is generated automatically through a pipeline which utilizes several open-source resources, which are capable of identifying entities, such as the reaction product or other compounds, and linguistic features, including coreferences. As a result, the graph is heterogeneous, containing both natural language and chemical information. Additionally, in order to confirm a proportion of the information contained within the graph, it was linked to external databases, this step provides a means of checking edges between nodes that is not based on a probabilistic model. Following the creation of the graph, knowledge graph embedding techniques are implemented to recommend alternative synthesis pathways. While inspired by the various synthesis prediction frameworks, this work differs by utilizing information extraction algorithms on textual data to produce the database of synthesis information. Following the creation of the graph, the recommendation algorithm is trained on both chemical and the semantic features found within the graph.
–
Presenters
-
Connor P O'Ryan
- University of Maryland, College Park