Available at: https://digitalcommons.calpoly.edu/theses/3158
Date of Award
9-2025
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Foaad Khosmood
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
The California Fair Political Practices Commission (FPPC) receives a high volume of inquiries via email from public officials, the general public, and other agencies, which currently requires staff to manually search through informational documents and manuals to provide timely responses. This process is both labor- and time-intensive.
To address this challenge, we design a question-answering (QA) system that drafts responses to emailed questions by retrieving relevant information from the FPPC’s manuals using a retrieval-augmented generation (RAG) framework. Although the current implementation focuses on a single manual, the system is designed to be adaptable to the broader set of FPPC documents. By converting the FPPC’s pub- licly available PDF manuals into Markdown files, we parse and segment text into sec- tions, retrieve documentation relevant to a query, and utilize a large language model (OpenAI’s ChatGPT-4.1) to generate responses grounded in the retrieved text.
Our hypothesis is that a RAG-based QA system with custom document parsing for information retrieval performs better, i.e. gives more accurate and correct answers grounded within the source data, than direct application of state-of-the-art LLMs like ChatGPT.
We validate our approach through manual review of retrieval accuracy and generated answers, as well as through a survey on Prolific, where participants compare responses from our system against those generated by directly querying ChatGPT with the FPPC manual.