Computer Science Department
BS in Software Engineering
Pyroprinting is a novel technique used by the Department of Biological Sciences to obtain “fingerprints” from the DNA of E. coli isolates in order to categorize them into strains. To determine the number of false positives that occur in the pyroprinting process, isolates with the same pyroprints needed to be sequenced to see if their underlying alleles match. If they do match, this shows they are indeed the same strain and are a true positive. If the alleles don’t match, they are different strains and are a false positive. To do this 100 isolates with nucleotide identifiers were sequenced. Over five million sequences were then analyzed using a program implemented on Hadoop. This program provided a general indicator of the efficacy of pyroprinting by grouping the sequences into their respective isolate buckets and analyzing them to determine which were false positives. The Hadoop implementation proved to be reliable and highly scalable. This method of analysis is generally applicable to many areas within bioinformatics, as well as potential uses in other industries. The results from the experiment are still being analyzed to determine the frequency of false positives, and how this can inform the use of pyroprinting.