2025 AIChE Annual Meeting

(182b) High-Accuracy Nanopore Sequencing to Quantify Replication Fidelity of Unnatural Base Pairs

Authors

Jayson R. Sumabat, University of Washington
Jorge A. Marchand, University of California, Berkeley
Innovations in sequencing, synthesis, and manipulation of nucleic acids, built around the 4-letter DNA alphabet (A, T, G, C), have driven major advances in medicine, genomics, and synthetic biology. Yet, research over the last three decades has shown that expanding the genetic alphabet to six or more letters is possible, offering new opportunities for biotechnology. Unnatural base pairing xeno nucleic acids (ubp XNAs) are synthetic nucleotides that maintain base pairing complementarity while remaining orthogonal to the natural bases. These ubp XNAs exhibit diverse structures, ranging from isomers of standard bases (isoG:isoC) to entirely novel hydrophobic pairs. However, the absence of next-generation sequencing tools for ubp XNAs has prevented high-throughput omics studies, limiting research progress. Further, replication errors often revert ubp XNAs to natural bases during replication, posing a significant challenge for routine molecular biology workflows and faithful retention of ubp XNAs in vivo.

Here, we present next-generation sequencing techniques to sequence ubp XNAs with high accuracy. Using nanopore sequencing, we train recurrent neural network models to sequence hydrogen bonding and non-hydrogen bonding ubp XNAs, achieving accuracy of up to 99%. We then apply these models to study ubp XNA loss during replication by tailoring models to detect known replication error modes. By combining theory, simulations, and high-throughput assays, we are able to assess the replication fidelity landscape of diverse ubp XNAs and provide new insights into replication errors. Using multiplexed condition screening (polymerase, pH, nucleotide concentration, etc.), we also report conditions that lead to significantly enhanced replication fidelity of the highly error-prone isoG:Me-isoC pair in a 6-letter PCR reaction (>98% fidelity per replication). By developing high-throughput sequencing techniques and answering open questions in ubp XNA replication, we are closing critical technology and knowledge gaps in the field, accelerating the development of xeno-nucleic acid technologies for synthetic biology.