Available at: https://digitalcommons.calpoly.edu/theses/1813
Date of Award
MS in Computer Science
Alexander M. Dekhtyar
Bacterial contamination in water sources is a serious health risk and the sources of the bacterial strains must be identified to keep people safe. This project is the result of a collaboration effort at Cal Poly to develop a new library-dependent Microbial Source Tracking method for determining sources of fecal contamination in the environment. The library used in this study is called Cal Poly Library of Pyroprints (CPLOP). The process of building CPLOP requires students to collect fecal samples from a multitude of sources in the San Luis Obispo area. A novel method developed by the biologists at Cal Poly called pyroprinting is then applied on the two intergenic regions of the E. coli isolates from these samples to obtain their fingerprints. These fingerprints are stored in the CPLOP database. In our study, we consider any E. coli samples whose fingerprints match above a certain threshold to be in the same group of bacterial strain. However, there has not yet been a final MST method that produces an acceptable level of accuracy. In this thesis, we propose a two-step MST classifier that combines two previous works: pyro-DBSCAN and k-RAP. These algorithms were developed specifically for CPLOP. We call our classifier HAP - Hybrid Algorithm for Pyroprints. The classifier works as follows. Given an unknown isolate, the first step requires performing clustering on the known isolates in the library and comparing the unknown isolate against the resulting clusters. If the isolate falls into a cluster, its classification will be returned as the dominant species of that cluster. Otherwise, we apply the k-Nearest Clusters Algorithm on this isolate to determine its final classification. Ultimately, HAP provides us a set of 16 decision strategies that identify the host species of an unknown sample with high accuracy.