Master's Theses

The Application, Construction, and Validation of Hidden Markov Model Profiles for Carbonic Anhydrase Enzymes

Samuel F. Kaplan, California Polytechnic State University, San Luis ObispoFollow

Available at: https://digitalcommons.calpoly.edu/theses/3232

Date of Award

3-2026

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Paul Anderson

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

Carbonic anhydrases (CAs) catalyze the reversible hydration of CO2 and have evolved independently at least eight times, resulting in structurally distinct enzyme families (α, β, γ, δ, ζ, η, θ, ι). Traditional sequence alignment methods struggle to classify these convergently evolved proteins because their sequential similarity does not reliably indicate functional or evolutionary relationships. Many CA sequences in public databases are annotated generically without family assignments, and prior computational approaches have focused predominantly on the three well characterized families (α, β, γ), leaving the five recently discovered classes without robust classification tools. Family level assignment is often a prerequisite for functional inference, inhibitor design, and phylogenetic analysis, as CA classes employ varying active site geometries, oligomeric structures, and catalytic mechanisms, which are difficult to predict from sequence alone.

In this thesis, we constructed family specific profile Hidden Markov Models (HMMs) for all eight CA families using curated seed alignments of 575 sequences. The resulting HMM profiles achieved 100% classification accuracy on held-out test sequences (n=119) and successfully classified 148,727 UniProt CA sequences, with 94.8% assigned to specific families. Bit score and entropy analysis revealed distinct intra-family variance, confirming that the models capture meaningful sequence differences, differentiating family models. These profiles provide the first comprehensive HMM based classification framework spanning all recognized CA families, enabling systematic annotation of carbonic anhydrase datasets.

Download

Included in

Bioinformatics Commons, Computer Sciences Commons

COinS

Master's Theses

The Application, Construction, and Validation of Hidden Markov Model Profiles for Carbonic Anhydrase Enzymes

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Search

Browse

Author Corner

LINKS

Master's Theses

The Application, Construction, and Validation of Hidden Markov Model Profiles for Carbonic Anhydrase Enzymes

Author

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Share

Search

Browse

Author Corner

LINKS