DOI: https://doi.org/10.15368/theses.2016.10
Available at: https://digitalcommons.calpoly.edu/theses/1538
Date of Award
3-2016
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Alex Dekhtyar
Abstract
The only official documentation of the lawmaking process at the California Legislature is unedited video recordings of committee hearings, bill texts, votes and analyses. While the bills resulting from these hearings are clear, using video recordings to understand how a bill was created is far too laborious for the average citizen. To increase public transparency, a service that provides easier access to the bill creation process was needed. In response to this need, the Digital Democracy initiative was established at Cal Poly by the Honorable Sam Blakeslee, former California State Senator and founder of the Institute for Advanced Technology and Public Policy.
The Digital Democracy initiative seeks to create a web platform that organizes, generates, and indexes large amounts of information about the legislative process. To accomplish this, automatic speech recognition is performed on the video recordings of committee hearings and the resulting text is manually improved and annotated with a web application called the "Transcription Tool". Unfortunately, this process is costly, labor intensive, and prohibits the scaling and long term viability of the platform. Early efforts to reduce transcription costs involved the development of improved transcription tool UI and systems for speaker diarization and text correction.
This thesis evaluates the effectiveness of these improvements on the human assisted transcription process employed by the Digital Democracy initiative. To facilitate this evaluation, a pipeline for automatic transcription improvement was developed, the improvements were incorporated into the transcription process, and a controlled experiment was run to measure the effects of these improvements. The results of the experiment demonstrate that the improvements reduced transcription editing costs by 16.89% while maintaining similar transcription quality.