DOI: https://doi.org/10.15368/theses.2017.102
Available at: https://digitalcommons.calpoly.edu/theses/1798
Date of Award
12-2017
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Alexander Dekhtyar
Abstract
In the modern world, huge amounts of text are being generated every minute. For example, Twitter users post their current emotions in tweets, while Facebook users vent about their experience in posts. In just one minute, Twitter users upload 350,000 tweets, and Facebook users post anywhere from 2.5 million to 3 million posts. To keep up with this growth in data, almost all of this information goes through automated text processing. To extract features such as the opinion and subjectivity in text, sentiment analysis is applied to the corpus. In this thesis, we present the TONGS library for conducting sentiment analysis. TONGS uses Word2Vec within the TensorFlow library to convert words into vector space representations. The TONGS library contains four different methods built upon previous research in sentiment analysis and Word2Vec. We further experiment and analyze these methods using the IMDB dataset. Finally, we introduce and test a new sentiment dataset from government hearings obtained through Digital Democracy, challenging the accuracy of the TONGS library in an unknown topic.