Theses and Dissertations

ORCID

https://orcid.org/0009-0009-7908-158X

Advisor

Swan, John

Committee Member

Banicescu, Ioana

Committee Member

Francom, Greg

Date of Degree

12-12-2025

Original embargo terms

Immediate Worldwide Access

Document Type

Graduate Thesis - Open Access

Major

Computer Science (Research Computer Science)

Degree Name

Master of Science (M.S.)

College

James Worth Bagley College of Engineering

Department

Department of Computer Science and Engineering

Abstract

Optimal decision-making under uncertainty is a challenge in everything from board games to business scenario planning. With visual programming systems, end-users can utilize reinforcement learning (RL) agents to identify an optimal policy in their own stochastic environments. However, while the RL agent can demonstrate that an optimal policy exists, RL polices tend to be opaque and difficult to interpret. This thesis investigates how Large Reasoning Models (LRMs) can generate human-understandable rules from RL policies. I compare multiple approaches and models, varying the format of the RL policy provided to the LRM as well as the prompting strategy. My results show that the efficacy of each approach depends on the environment. In low stochasticity environments, LRMs reason more effectively by observing RL agent episode trajectories, whereas in high-stochasticity environment, LRMs reason more effectively by reviewing an RL agent’s policy itself. This work contributes to bridging the gap between RL’s computational power and the need for transparent, human-simulatable decision rules.

Share

COinS