Hansen, A. Eric
Date of Degree
Graduate Thesis - Open Access
Master of Science
A partially observable Markov decision process (POMDP) is a mathematical framework for planning and control problems in which actions have stochastic effects and observations provide uncertain state information. It is widely used for research in decision-theoretic planning and reinforcement learning. % To cope with partial observability, a policy (or plan) must use memory, and previous work has shown that a finite-state controller provides a good policy representation. This thesis considers a previously-developed bounded policy iteration algorithm for POMDPs that finds policies that take the form of stochastic finite-state controllers. Two new improvements of this algorithm are developed. First improvement provides a simplification of the basic linear program, which is used to find improved controllers. This results in a considerable speed-up in efficiency of the original algorithm. Secondly, a branch and bound algorithm for adding the best possible node to the controller is presented, which provides an error bound and a test for global optimality. Experimental results show that these enhancements significantly improve the algorithm's performance.
Marwah, Gaurav, "Algorithms for stochastic finite memory control of partially observable systems" (2005). Theses and Dissertations MSU. 433.