DOI: https://doi.org/10.15368/theses.2017.112
Available at: https://digitalcommons.calpoly.edu/theses/1796

Date of Award

12-2017

Degree Name

MS in Computer Science

Department/Program

Computer Science

Advisor

Foaad Khosmood

Abstract

Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create an extensive tree. This thesis proposes a novel three part system which can automatically interpret free form text to extract relationships and produce a family tree compliant with GED- COM formatting. The first subsystem builds an extendable database of genealogical records that are systematically extracted from free form text. This corpus provides the tagged data for the second subsystem, which trains a Naı̈ve Bayes classifier to predict relationships from free form text by examining the types of relationships for pairs of entities and their associated feature vectors. The last subsystem accumulates extracted relationships into family trees. When a multiclass Naı̈ve Bayes classifier is used, the proposed system achieves an accuracy of 54%. When binary Naı̈ve Bayes classifiers are used, the proposed system achieves accuracies of 69% for the child to parent relationship classifier, 75% for the spousal relationship classifier, and 73% for the sibling relationship classifier.

Download

Included in

Computer Engineering Commons

COinS

Master's Theses

Genealogy Extraction and Tree Generation from Free Form Text

Date of Award

Degree Name

Department/Program

Advisor

Abstract

Included in

Search

Browse

Author Corner

LINKS

Master's Theses

Genealogy Extraction and Tree Generation from Free Form Text

Author

Date of Award

Degree Name

Department/Program

Advisor

Abstract

Included in

Share

Search

Browse

Author Corner

LINKS