Recommended Citation

Postprint version. Published in International Journal for Computers and Their Applications, Volume 23, Issue 3, September 1, 2016, pages 176-194.

Abstract

We present a probabilistic model for extracting and storing information from WordNet and the British National Corpus. We map the data into a directed probabilistic graph that can be used to compute the conditional probability between a pair of words from the English language. For example, the graph can be used to deduce that there is a 10% probability that someone who is interested in dogs is also interested in the word “canine”. We propose three ways for computing this probability, where the best results are achieved when performing multiple random walks in the graph. Unlike existing approaches that only process the structured data in WordNet, we process all available information, including natural language descriptions. The available evidence is expressed as simple Horn clauses with probabilities. It is then aggregated using a Markov Logic Network model to create the probabilistic graph. We experimentally validate the quality of the data on five different benchmarks that contain collections of pairs of words and their semantic similarity as determined by humans. In the experimental section, we show that our random walk algorithm with logarithmic distance metric produces higher correlation with the results of the human judgment on three of the five benchmarks and better overall average correlation than the current state-of-the-art algorithms.

Disciplines

Computer Sciences

Copyright

Number of Pages

Publisher statement

Employers/authors may copy, or authorize the copy of, the paper, or derivative portions of the paper for company/personal use, provided the copies are not offered for sale, that the source of the material is indicated, and that ISCA's endorsement is not implied by the use.

Download

Included in

Computer Sciences Commons

COinS

URL: https://digitalcommons.calpoly.edu/csse_fac/270

Computer Science and Software Engineering

Creating a Probabilistic Model for WordNet

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Search

Browse

Author Corner

LINKS

Computer Science and Software Engineering

Creating a Probabilistic Model for WordNet

Author Info

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Share

Search

Browse

Author Corner

LINKS