Expanding the Molecular Alphabet of DNA Data Storage Systems with Single-molecule Nanopore Readout
ORAL
Abstract
DNA is a promising next-generation data storage medium, however, the recording latency and synthesis cost of DNA oligos using the four natural nucleotides remain high. In this talk, we describe a new DNA storage system that uses an extended 11-letter molecular alphabet combining natural and chemically modified nucleotides. Experimental results involving a library of 77 oligo sequences show that one can readily discriminate different combinations of monomers using single-molecule detection with MspA nanopores. We further demonstrate full nanopore sequencing of hybrid synthetic DNA oligos using commercial Oxford Nanopores by developing a custom neural network architecture to classify raw current signals, yielding an average accuracy exceeding 60%, which is 39 times higher than random guessing. Molecular dynamics simulations show that most chemically modified nucleotides do not induce dramatic disruption of the DNA double helix, which suggests that the extended alphabet is compatible with PCR-based random access data retrieval. Broadly, these methodologies provide a forward path for new implementations of molecular recorders.
*The work was funded by the NSF+SRC SemiSynBio program under agreement number 1807526 and NSF grants 1618366 and 2008125. A.A. and M.C. acknowledge support from NHGRI/NIH via grant R21-HG011741. The supercomputer time was provided by the University of Illinois at the Blue Waters Petascale System.
–
Presenters
-
Kasra Tabatabaei
- University of Illinois at Urbana Champaign