Available at: https://digitalcommons.calpoly.edu/theses/3062
Date of Award
6-2025
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Borislav Hristov
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Current computational tools for analyzing chromatin organization are mainly focused on intrachromosomal interactions, despite growing evidence that suggests long-range interactions across chromosomes contribute to transcriptional regulation and disease development. This thesis aims to address this gap in interchromosomal genome analysis, presenting a robust computational pipeline that identifies a clique (i.e., a subgraph) of highly interacting trans-chromosomal regions anchored at a user-specified seed genomic locus. A weighted interaction network is constructed from an input Hi-C contact matrix, a widely used experimental assay for measuring genome-wide chromatin interactions. We model this input contact matrix as a graph and devise three different strategies to computationally find biologically important cliques: (1) a greedy heuristic for efficient local exploration, (2) a simulation-based random walk with restarts, and (3) an analytical formulation of the same random walk process. To validate the performance of this pipeline, we focus on TTN, a key muscle gene whose splicing is essential for human heart development. Hi-C data from wild-type and TTN promoter knockout cardiomyocytes are used to compare structural differences in TTN's long-range interactors. Though sparse contacts in the knockout data limit definitive comparison, cliques built from the wild-type matrix reveal loci with strong gene correlation. We further design several different background models to statistically assess the significance of these interactions. Our results highlight the effectiveness of network-based methods in uncovering functionally relevant interchromosomal interactions and lay the groundwork for future analyses.