Date of Award

3-2024

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Foaad Khosmood

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

Digital Democracy is a CalMatters and California Polytechnic State University initia-
tive to promote transparency in state government by increasing access to the Califor-
nia legislature. While Digital Democracy is made up of many resources, one founda-
tional step of the project is obtaining accurate, timely transcripts of California Senate
and Assembly hearings. The information extracted from these transcripts provides
crucial data for subsequent steps in the pipeline. In the context of Digital Democracy,
upleveling is when humans verify, correct, and annotate the transcript results after
the legislative hearings have been automatically transcribed. The upleveling process
is done with the assistance of a software application called the Transcription Tool.
The human upleveling process is the most costly and time-consuming step of the Dig-
ital Democracy pipeline. In this thesis, we hypothesize that we can make significant
reductions to the time needed for upleveling by using Natural Language Processing
(NLP) systems and techniques. The main contribution of this thesis is engineering
a new automatic transcription pipeline. Specifically, this thesis integrates a new au-
tomatic speech recognition service, a new speaker diarization model, additional text
post-processing changes, and a new process for speaker identification. To evaluate the system’s improvements, we measure the accuracy and speed of the newly integrated features and record editor upleveling time both before and after the additions.

Share

COinS