2017 Annual Meeting
(609h) 40,000+ Genomes and Counting: Computational Lessons from Building a Giant Culture Collection
Biological sequence data is as rich and complex as it is inexpensive to produce, and the pace of DNA sequencing technology development quickly renders current best practices and conventions for storing and organizing biological sequence information obsolete and/or irrelevant. This is a particular challenge for companies specializing in microbiome science and amassing large collections of microbial isolates -- how can we account for this ever-changing environment and maintain a stable data infrastructure at scale? At AgBiome we have selected a technology stack for genomic data science that enables rapid development, persistent relevance of analyses, and reproducibility -- we employ suite of opensource tools including NextFlow, Docker, Mash/Sourmash, and Jupyter. In this talk, I will contrast data analysis in academia versus industry and present an analysis case study involving one of AgBiome's products in development to demonstrate how we battle the head-spinning dynamism of genomics.
