Available at: https://digitalcommons.calpoly.edu/theses/3021
Date of Award
6-2025
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
John S. Seng
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Previous research has demonstrated that reinforcement learning agents can learn to steer differential-drive robots around obstacles using 2D lidar scans as observations. However, these studies typically treat all range returns as undifferentiated obstacles—objects to avoid—without distinguishing between different object types. This thesis builds upon previous research by introducing an adversarial task in which an agent must interpret raw range readings to both avoid static obstacles and identify, pursue, and engage a hostile target.
To investigate this problem, this thesis introduces TankGame, a novel, lightweight 2D tank duel simulator. Each agent receives a 360° lidar scan, controls its motion via tread velocities, and fires slow-moving projectiles within arenas containing either procedurally generated or handcrafted obstacle layouts. Obstacles are modeled as circles of varying radii, while tanks and projectiles have rectangular profiles, requiring agents to infer object types purely from geometric cues in the lidar data. Implemented in C++ with a standardized Python API, TankGame achieves approximately 4,000 simulation steps per second on a single CPU thread—offering a high-fidelity yet computationally efficient alternative to grid-world abstractions and complex game engines.
Alongside the TankGame environment, this thesis presents a three-stage training methodology. In the first stage, a Proximal Policy Optimization (PPO) agent with a Circular Convolutional Neural Network (CCNN) encoder is trained against a static opponent, allowing it to jointly learn lidar features and a baseline policy. In the second stage, new actor and critic heads, each incorporating a Long Short-Term Memory (LSTM) layer, are trained atop the frozen CCNN encoder, enabling the agent to exploit temporal dependencies within the lidar features. Finally, in the third stage, the LSTM-PPO agent undergoes adversarial fine-tuning through self-play to improve its robustness against adaptive opponents.
Agents are trained on procedurally generated maps and evaluated on a suite of three evaluation map sets. A batch of 400 procedurally generated maps assess overall performance, while two collections of handcrafted maps test obstacle navigation and the ability to locate and eliminate static or moving targets. Experimental results demonstrate the LSTM-PPO agent defeats the PPO baseline in 64% of head-to-head duels (baseline wins 20%) and can locate and eliminate static targets in 97% (baseline 88%) of the procedurally generated maps, with additional gains achieved through adversarial fine-tuning.
The key contributions of this work are the open-source TankGame environment, a reproducible training methodology, a suite of evaluation maps, and high-performing baseline agents to support future research in adversarial reinforcement learning.