DOI: https://doi.org/10.15368/theses.2015.136
Available at: https://digitalcommons.calpoly.edu/theses/1474
Date of Award
9-2015
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Foaad Khosmood
Abstract
Information Extraction (IE) is the process of analyzing documents and identifying desired pieces of information within them. Many IE systems have been developed over the last couple of decades, but there is still room for improvement as IE remains an open problem for researchers. This work discusses the development of a hybrid IE system that attempts to combine the strengths of rule-based and statistical IE systems while avoiding their unique pitfalls in order to achieve high performance for any type of information on any type of document. Test results show that this system operates competitively in cases where target information belongs to a highly-structured data type and when critical contextual information is in close proximity to the target.