Graph identification of proteins in tomograms (GRIP-Tomo)
ORAL
Abstract
In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connectivity of protein structures is invariant, and they are distinctive for the purpose of protein identification from distorted data presented in volume densities. Three-dimensional densities of a protein or a complex from simulated tomographic volumes were transformed into mathematical graphs as observables. We systematically introduced data distortion or defects such as missing fullness of data, the tumbling effect, and the missing wedge effect into the simulated volumes, and varied the distance cutoffs in pixels to capture the varying connectivity between the density cluster centroids in the presence of defects. A similarity score between the graphs from the simulated volumes and the graphs transformed from the physical protein structures in point data was calculated by comparing their network theory order parameters including node degrees, betweenness centrality, and graph densities. By capturing the essential topological features defining the heterogeneous morphologies of a network, we were able to accurately identify proteins and homo-multimeric complexes from ten topologically distinctive samples without noise. Our approach empowers future developments to provide pattern mining with interpretability that classifies single-domain protein native topologies as well as distinct single-domain proteins from multimeric complexes within noisy volumes.
*The research was performed using resources through Research Computing at Pacific Northwest National Laboratory (PNNL) under Contract DE-AC05-76RL01830. A portion of the research was conducted under the Laboratory Directed Research and Development Program at PNNL; a project award (DOI: https://doi.org/10.46936/intm.proj.2021.60121/60001438) from the Environmental Molecular Sciences Laboratory, sponsored by the Biological and Environmental Research program under Contract No. DE-AC05-76RL01830; in part by the DOE Office of Workforce Development for Teachers and Scientists under the Community College Internship Program to I.T.G.
–
Publication:George, A. D. & Cheung, M. S., "Graph identification of proteins in tomographs (GRIP-Tomo)," Provisional Application No. 63/353,974, 2022. A.D. George, D. Kim, T.H. Moser, I.T. Gildea. J.E. Evans, M.S. Cheung, "Graph identification of proteins in tomographs (GRIP-Tomo)