DOI: https://doi.org/10.15368/theses.2012.90
Available at: https://digitalcommons.calpoly.edu/theses/773

Date of Award

6-2012

Degree Name

MS in Computer Science

Department/Program

Computer Science

Advisor

Franz Kurfess

Abstract

This thesis presents a look at the suitability of Suffix Trees for full text indexing and retrieval. Typically suffix trees are built on a character level, where the tree records which characters follow each other character. By building suffix trees for documents based on words instead of characters, the resulting tree effectively indexes every word or sequence of words that occur in any of the documents. Ukkonnen's algorithm is adapted to build word-level suffix trees. But the primary focus is on developing Algorithms for searching the suffix tree for exact and approximate, or fuzzy, matches to arbitrary query strings. A proof-of-concept implementation is built and compared to a Lucene index for retrieval over a subset of the Reuters RCV1 data set.

Download

Included in

Databases and Information Systems Commons

COinS

Master's Theses

Suffix Trees for Document Retrieval

Date of Award

Degree Name

Department/Program

Advisor

Abstract

Included in

Search

Browse

Author Corner

LINKS

Master's Theses

Suffix Trees for Document Retrieval

Author

Date of Award

Degree Name

Department/Program

Advisor

Abstract

Included in

Share

Search

Browse

Author Corner

LINKS