Available at: https://digitalcommons.calpoly.edu/theses/2651
Date of Award
MS in Computer Science
College of Engineering
College of Engineering
Committee meetings are a fundamental part of the legislative process in which
constituents, lobbyists, and legislators alike can speak on proposed bills at the
local and state level. Oftentimes, unspoken “rules” or standards are at play in
political processes that can influence the trajectory of a bill, leaving constituents
without a political background at an inherent disadvantage when engaging with
the legislative process. The work done in this thesis aims to explore the extent to
which the language and phraseology of a general public testimony can influence a
vote, and examine how this information can be used to promote civic engagement.
The Digital Democracy database contains digital records for over 40,000 real
testimonies by non-legislator public persons presented at California Legislature
committee meetings 2015-2018, along with the speakers’ desired vote outcome
and individual legislator votes in that discussion. With this data, we conduct a
linguistic analysis that is then leveraged by the Constituent phraseology Analysis
Tool (CPAT) to generate a user-based intelligent statistical comparison between
a proposed testimony and language patterns that have previously been successful.
The following questions are at the core of this research: Which (if any) lan-
guage features are correlated with persuasive success in a legislative context?
Does the committee’s topic of discussion impact the language features that can
lend to a testimony’s success? Can mirroring a legislator’s speech patterns change
the probability of the vote going your way? How can this information be used to
level the playing field for constituents who want their voices heard?
Given the 33 linguistic features developed in this research, supervised classifi-
cation models were able to predict testimonial success with up to 85.1% accuracy,
indicating that the new features had a significant impact on the prediction of
success. Adding these features to the 16 baseline linguistic features developed
in Gundala’s  research improved the prediction accuracy by up to 2.6%. We
also found that balancing the dataset of testimonies drastically impacted the
prediction performance metrics, with 93% accuracy achieved for the imbalanced
dataset and 60% accuracy after balancing. The Constituent Phraseology Analysis
Tool showed promise in the generation of linguistic analysis based on previously
successful language patterns, but requires further development before achieving
true usability. Additionally, predicting success based on linguistic similarity to a
legislator on the committee produced contradictory results. Experiments yielded
a 4% increase in predictive accuracy when adding comparative language features
to the feature set, but further experimentation with weight distributions revealed
only marginal impacts from comparative features.