We propose a novel methodology for extracting semantic similarity knowledge from semi-structured sources, such as WordNet. Unlike existing approaches that only explore the structured information (e.g., the hypernym relationship in WordNet), we present a framework that allows us to utilize all available information, including natural language descriptions. Our approach constructs a semantic corpus. It is represented using a graph that models the relationship between phrases using numbers. The data in the semantic corpus can be used to measure the similarity between phrases, the similarity between documents, or to perform a semantic search in a set of documents that uses the meaning of words and phrases (i.e., search that is not keyword-based).


Computer Sciences

Number of Pages


Publisher statement

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.



URL: http://digitalcommons.calpoly.edu/csse_fac/248