DOI: https://doi.org/10.15368/theses.2011.104
Available at: https://digitalcommons.calpoly.edu/theses/547
Date of Award
6-2011
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Timothy J. Kearns
Abstract
This thesis describes a method for using a computationally efficient algorithm to identify candidate DNA primer sequences. DNA sequencing primers are a critical element of polymerase chain reaction (PCR) and DNA sequence analysis. A variety of methods for deriving DNA primers exist, but such methods are often computationally intensive, or do not use available sequence data that could potentially serve as a possible resource for primer identification. Though no current algorithm exists which will always yield a correct primer for every need, evaluation of multi-sequence alignments may provide a reliable source for primer candidates. However, an exact mathematical solution for multi-sequence alignments, using currently available computational resources, is only viable for a very small number of sequences. Any solution for a larger number of sequences will therefore use other computational methods and heuristics to estimate an alignment.
The solution presented here, featuring a combination of ClustalW and HMMER alignment tools, is able to identify conserved regions in sequence data in a computationally efficient manner, and from these regions, suggest viable primer candidates. Computational complexity for the HMMER alignment effort has been maintained at O(MN); the suggested process for creating sequence alignments lead to a 15-fold improvement in performance over conventional methods, while also successfully identifying fungal specific primers, with individual examples showing 90% or greater match for the given fungal phylum.
It was found that alignment quality could be further improved by using simple sorting methods against input sequence data.