DOI: https://doi.org/10.15368/theses.2021.98
Available at: https://digitalcommons.calpoly.edu/theses/2421
Date of Award
6-2021
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
Franz J. Kurfess
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Stock price prediction is of strong interest but a challenging task to both researchers and investors. Recently, sentiment analysis and machine learning have been adopted in stock price movement prediction. In particular, retail investors’ sentiment from online forums has shown their power to influence the stock market. In this paper, a novel system was built to predict stock price movement for the following trading day. The system includes a web scraper, an enhanced sentiment analyzer, a machine learning engine, an evaluation module, and a recommendation module. The system can automatically select the best prediction model from four state-of-the-art machine learning models (Long Short-Term Memory, Support Vector Machine, Random Forest, and Extreme Boost Gradient Tree) based on the acquired data and the models’ performance. Moreover, stock market lexicons were created using large-scale text mining on the Yahoo Finance Conversation boards and natural language processing. Experiments using the top 30 stocks on the Yahoo users’ watchlists and a randomly selected stock from NASDAQ were performed to examine the system performance and proposed methods. The experimental results show that incorporating sentiment analysis can improve the prediction for stocks with a large daily discussion volume. Long Short-Term Memory model outperformed other machine learning models when using both price and sentiment analysis as inputs. In addition, the Extreme Boost Gradient Tree (XGBoost) model achieved the highest accuracy using the price-only feature on low-volume stocks. Last but not least, the models using the enhanced sentiment analyzer outperformed the VADER sentiment analyzer by 1.96%.