Date of Award
MS in Computer Science
Timothy J. Kearns
This thesis describes a method for using a computationally efficient algorithm to identify candidate DNA primer sequences. DNA sequencing primers are a critical element of polymerase chain reaction (PCR) and DNA sequence analysis. A variety of methods for deriving DNA primers exist, but such methods are often computationally intensive, or do not use available sequence data that could potentially serve as a possible resource for primer identification. Though no current algorithm exists which will always yield a correct primer for every need, evaluation of multi-sequence alignments may provide a reliable source for primer candidates. However, an exact mathematical solution for multi-sequence alignments, using currently available computational resources, is only viable for a very small number of sequences. Any solution for a larger number of sequences will therefore use other computational methods and heuristics to estimate an alignment.
The solution presented here, featuring a combination of ClustalW and HMMER alignment tools, is able to identify conserved regions in sequence data in a computationally efficient manner, and from these regions, suggest viable primer candidates. Computational complexity for the HMMER alignment effort has been maintained at O(MN); the suggested process for creating sequence alignments lead to a 15-fold improvement in performance over conventional methods, while also successfully identifying fungal specific primers, with individual examples showing 90% or greater match for the given fungal phylum.
It was found that alignment quality could be further improved by using simple sorting methods against input sequence data.
The definitive version is available at https://doi.org/10.15368/theses.2011.104.