CRISPR/Cas system has emerged as a powerful genome-editing tool for human gene therapy and the rapid construction of cellular factories. However, the question of where to integrate exogenous enzymes and pathways remains unanswered. Selecting a site for gene integration requires the incorporation of complex criteria such as factors involved in CRISPR/Cas-mediated integration, genetic stability, and gene expression and therefore, usually requires strenuous characterization of sites on particular or different chromosomal locations. To address these issues, we develop CRISPR-COPIES, a
COmputational
Pipeline for the
Identification of CRISPR/Cas-facilitated int
Egration
Sites that can discover neutral integration sites in a genome-wide manner for any organism with a genome reported in NCBI and for any CRISPR/Cas system. The tool applies ScaNN, a state-of-the-art model on the embedding-based nearest neighbor search for fast and accurate off-target search and can identify intergenic sites within minutes. Homology mapping is also incorporated to determine homologs of essential genes in the target organism, thereby locating stable integration sites. We demonstrate the utility of the software through characterization of sites in
Cupriavidus necator,
Saccharomyces cerevisiae, and HEK293T. We anticipate CRISPR-COPIES will serve as a useful tool for targeted DNA integration and aid in the characterization of synthetic biology toolkits, rapid strain development to produce valuable biochemicals, and gene therapy. CRISPR-COPIES is available as a user-friendly web application and a command line tool.
