Available at: https://digitalcommons.calpoly.edu/theses/2823
Date of Award
6-2024
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Rodrigo Canaan
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
In the era of total digitization of documents, navigating vast and heterogeneous data landscapes presents significant challenges for effective information retrieval, both for humans and digital agents. Traditional methods of knowledge organization often struggle to keep pace with evolving user demands, resulting in suboptimal outcomes such as information overload and disorganized data. This thesis presents a case study on a pipeline that leverages principles from cognitive science, graph theory, and semantic computing to generate semantically organized knowledge graphs. By evaluating a combination of different models, methodologies, and algorithms, the pipeline aims to enhance the organization and retrieval of digital documents. The proposed approach focuses on representing documents as vector embeddings, clustering similar documents, and constructing a connected and scalable knowledge graph. This graph not only captures semantic relationships between documents but also ensures efficient traversal and exploration. The practical application of the system is demonstrated in the context of digital libraries and academic research, showcasing its potential to improve information management and discovery. The effectiveness of the pipeline is validated through extensive experiments using contemporary open-source tools.
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons