DOI: https://doi.org/10.15368/theses.2021.42
Available at: https://digitalcommons.calpoly.edu/theses/2365
Date of Award
5-2021
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Alexander Dekhtyar
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Microbial Source Tracking (MST) is a field of study that attempts to identify the source of fecal contamination in waterways in order to assist with development of remediation strategies. Biologists at Cal Poly Center for Applications in Biotechnology (CAB) are developing a new MST method using microbes from the genus Bacteroides. Bacteroides species are host-specific microorganisms that can theoretically be used to trace back to a single host species. After fecal samples are collected, biologists use Next-Generation Sequencing (NGS) techniques to obtain only the genetic sequences of microorganisms belonging to the phylum Bacteroidetes. Investigators hypothesize that similar sequences belong to the same phlyogenetic group (i.e., the same genus) and can therefore be computationally clustered. Each cluster of related sequences, typically 97% similar, is called an Operational Taxonomic Unit (OTU). Theoretically, an OTU acts as a molecular signature that can be traced back to a specific host genus. This thesis presents LOTUS, the Library of OTUs, a web-based computational tool for the preliminary investigation of the use of the Bacteroides OTU library as an MST method. This work discusses the four contributions of LOTUS: a database design which accurately models OTUs and the underlying relationships necessary for source tracking, a pipeline to create OTUs from raw sequencing reads, a method of assigning taxonomy to OTUs, and a web-based user interface. In preliminary testing for a reference library of twelve samples, LOTUS produced 1,431 OTUs, of which 891 were single-source (OTUs derived from sequences from a single host species). Using these OTUs, LOTUS was able to accurately taxonomically match four of five unknown test samples, showing promise for using OTUs as an MST method.