DOI: https://doi.org/10.15368/theses.2022.87
Available at: https://digitalcommons.calpoly.edu/theses/2626
Date of Award
6-2022
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Alexander Dekhtyar
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Over the past two decades there has been a rapid decline in public oversight of state and local governments. From 2003 to 2014, the number of journalists assigned to cover the proceedings in state houses has declined by more than 30\%. During the same time period, non-profit projects such as Digital Democracy sought to collect and store legislative bill and hearing information on behalf of the public. More recently, AI4Reporters, an offshoot of Digital Democracy, seeks to actively summarize interesting legislative data.
This thesis presents STRAINER, a parallel project with AI4Reporters, as an active data retrieval and filtering system for surfacing newsworthy legislative data. Within STRAINER we define and implement a process pipeline by which information regarding legislative bill discussion events can be collected from a variety of sources and aggregated into feature sets suitable for machine learning. Utilizing two independent labeling techniques we trained a variety of SVM and Logistic Regression models to predict the newsworthiness of bill discussions that took place in the California State Legislature during the 2017-2018 session year. We found that our models were able to correctly retrieve more than 80\% of newsworthy discussions.
Included in
Computational Engineering Commons, Journalism Studies Commons, Other Computer Engineering Commons