Available at: https://digitalcommons.calpoly.edu/theses/1615
Date of Award
MS in Computer Science
The California state legislature introduces approximately 5,000 new bills each legislative session. While the legislative hearings are recorded on video, the recordings are not easily accessible to the public. The lack of official transcripts or summaries also increases the effort required to gain meaningful insight from those recordings. Therefore, the news media and the general population are largely oblivious to what transpires during legislative sessions.
Digital Democracy, a project started by the Cal Poly Institute for Advanced Technology and Public Policy, is an online platform created to bring transparency to the California legislature. It features a searchable database of state legislative committee hearings, with each hearing accompanied by a transcript that was generated by an internal transcription tool.
This thesis presents SKEWER, a pipeline for building a spoken-word knowledge graph from those transcripts. SKEWER utilizes a number of natural language processing tools to extract named entities, phrases, and sentiments from the transcript texts and aggregates the results of those tools into a graph database. The resulting graph can be queried to discover knowledge regarding the positions of legislators, lobbyists, and the general public towards specific bills or topics, and how those positions are expressed in committee hearings. Several case studies are presented to illustrate the new knowledge that can be acquired from the knowledge graph.