Available at: https://digitalcommons.calpoly.edu/theses/959
Date of Award
MS in Computer Science
Pyroprinting is a novel, library-based microbial source tracking method developed by the Biology department at Cal Poly, San Luis Obispo. This method consists of two parts: (1) a collection of bacterial fingerprints, called pyroprints, from known host species, and (2) a method for pyroprint comparison. Currently, Cal Poly Library of Pyroprints (CPLOP), a web-based database application, provides storage and analysis of over $10000$ pyroprints. This number is quickly growing as students and researchers continue to use pyroprinting for research. Biologists conducting research using pyroprinting rely on methods for partitioning collected bacterial isolates into bacterial strains. Clustering algorithms are commonly used for bacterial strain analysis of organisms in computational biology. Unfortunately, agglomerative hierarchical clustering, a commonly used clustering algorithm, is inadequate given the nature of data collection for pyroprinting. While the clusters produced by agglomerative hierarchical clustering are acceptable, pyroprinting requires a method of analysis that is scalable and incorporates useful metadata into the clustering process. We propose ontology-based hierarchical clustering (OHClust!), a modification of agglomerative hierarchical clustering that expresses metadata-based relationships as an ontology to direct the order in which hierarchical clustering algorithms analyze the data. In this thesis, the strengths and weaknesses of OHClust! are discussed, and its performance is analyzed in comparison to agglomerative hierarchical clustering.