Abstract

Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account.

Disciplines

Computer Sciences

Number of Pages

8

Publisher statement

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Share

COinS
 

URL: https://digitalcommons.calpoly.edu/csse_fac/262