Theses and Dissertations
ORCID
https://orcid.org/0009-0009-7908-158X
Advisor
Swan, John
Committee Member
Banicescu, Ioana
Committee Member
Francom, Greg
Date of Degree
12-12-2025
Original embargo terms
Immediate Worldwide Access
Document Type
Graduate Thesis - Open Access
Major
Computer Science (Research Computer Science)
Degree Name
Master of Science (M.S.)
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
Optimal decision-making under uncertainty is a challenge in everything from board games to business scenario planning. With visual programming systems, end-users can utilize reinforcement learning (RL) agents to identify an optimal policy in their own stochastic environments. However, while the RL agent can demonstrate that an optimal policy exists, RL polices tend to be opaque and difficult to interpret. This thesis investigates how Large Reasoning Models (LRMs) can generate human-understandable rules from RL policies. I compare multiple approaches and models, varying the format of the RL policy provided to the LRM as well as the prompting strategy. My results show that the efficacy of each approach depends on the environment. In low stochasticity environments, LRMs reason more effectively by observing RL agent episode trajectories, whereas in high-stochasticity environment, LRMs reason more effectively by reviewing an RL agent’s policy itself. This work contributes to bridging the gap between RL’s computational power and the need for transparent, human-simulatable decision rules.
Recommended Citation
Huber, Joel Thomas, "Toward approachable reinforcement learning: Using dataflow application and large language models for human-understandable policies" (2025). Theses and Dissertations. 6784.
https://scholarsjunction.msstate.edu/td/6784