DOI: https://doi.org/10.15368/theses.2018.170
Available at: https://digitalcommons.calpoly.edu/theses/2503
Date of Award
6-2018
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Franz Kurfess
Abstract
Reinforcement Learning (RL) is a subset of machine learning primarily concerned with goal-directed learning and optimal decision making. RL agents learn based on a reward signal discovered from trial and error in complex, uncertain environments with the goal of maximizing positive reward signals. RL approaches need to scale up as they are applied to more complex environments with extremely large state spaces. Inefficient exploration methods cannot sufficiently explore complex environments in a reasonable amount of time, and optimal policies will be unrealized resulting in RL agents failing to solve an environment.
This thesis proposes a novel variant of the Actor-Advantage Critic (A2C) algorithm. The variant is validated against two state-of-the-art RL algorithms, Deep Q-Network (DQN) and A2C, across six Atari 2600 games of varying difficulty. The experimental results are competitive with state-of-the-art and achieve lower variance and quicker learning speed. Additionally, the thesis introduces a metric to objectively quantify the difficulty of any Markovian environment with respect to the exploratory capacity of RL agents.