We propose a novel methodology for extracting semantic similarity knowledge from semi-structured sources, such as WordNet. Unlike existing approaches that only explore the structured information (e.g., the hypernym relationship in WordNet), we present a framework that allows us to utilize all available information, including natural language descriptions. Our approach constructs a semantic corpus. It is represented using a graph that models the relationship between phrases using numbers. The data in the semantic corpus can be used to measure the similarity between phrases, the similarity between documents, or to perform a semantic search in a set of documents that uses the meaning of words and phrases (i.e., search that is not keyword-based).


