Date of Award

12-2025

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Paul Anderson

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

The reproducibility crisis in biomarker discovery stems from traditional approaches that often treat molecular features as independent variables while ignoring the networked nature of biological systems. We present an interpretable-by-design framework that models individual tumor network states using graph attention networks (GATs) to discover robust breast cancer biomarkers. By constraining the search space through biologically informed gene selection and multi-relational graphs integrating protein-protein interactions, pathways, and co-expression networks, we guide the model toward genuine biological relationships rather than spurious correlations. Our ensemble GAT approach achieved 77.4% balanced accuracy for molecular subtype classification. Systematic analysis of attention weights revealed an unexpected finding: 98 of 99 high-confidence biomarkers were terminal nodes rather than network hubs, consistently connecting to established breast cancer drivers including TP53, EGFR, ESR1, and CCND1. We successfully distilled these network-based discoveries into a 70-gene diagnostic panel using interpretable linear models, achieving 75.3% accuracy with expression data alone. Our biologically constrained, interpretable-by-design approach demonstrates how network-guided machine learning yields both mechanistic understanding and reproducible biomarkers.

Available for download on Saturday, December 09, 2028

Share

COinS