Available at: https://digitalcommons.calpoly.edu/theses/3205
Date of Award
12-2025
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Foaad Khosmood
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Among various California bill hearings, it is crucial to identify which hearings are most likely to receive media coverage, as this highlights the importance of the bill and the hearing, while also underscoring its societal impact. Digital Democracy, established at Cal Poly’s Institute for Advanced Technology and Public Policy, provides users access to a distinctive dataset and information about state legislative committee hearings, hearing transcripts, assets, bill information, etc.
This thesis presents a machine learning framework for predicting whether California legislative bills are likely to receive media coverage. Drawing on a dataset of legislative ”tip sheets” provided by Digital Democracy and media coverage records from CalMatters, we developed a semantic similarity-based mapping engine to identify relevant bill-article pairs, achieving 93% precision through manual validation. This approach expanded our dataset from 424 manually verified pairs to 1,824 samples (3.3× increase), enabling more robust model training.
We evaluated various machine learning algorithms. Our best-performing model, XG-Boost, achieved 89.91% accuracy, 90.81% precision, and 89.90% F1-score. Feature importance analysis revealed that temporal features dominate predictions, accounting for 62.3% of total importance.
The results demonstrate that machine learning models can effectively predict media attention for legislative bills, providing a practical foundation for media resource allocation, legislative transparency, and predictive analysis in political journalism and public affairs.