Abstract
We present a formalization of algorithms for solving Markov Decision
Processes (MDPs) with formal guarantees on the optimality of their
solutions. In particular we build on our analysis of the Bellman
operator for discounted infinite horizon MDPs. From the iterator rule
on the Bellman operator we directly derive executable value iteration
and policy iteration algorithms to iteratively solve finite MDPs. We
also prove correct optimized versions of value iteration that use
matrix splittings to improve the convergence rate. In particular, we
formally verify Gauss-Seidel value iteration and modified policy
iteration. The algorithms are evaluated on two standard examples from
the literature, namely, inventory management and gridworld. Our
formalization covers most of chapter 6 in Puterman's book
"Markov Decision Processes: Discrete Stochastic Dynamic
Programming".
License
History
- March 1, 2023
- Add MPI and Gauss-Seidel Iteration, improved code generation.
Topics
Session MDP-Algorithms
- MDP_fin
- Value_Iteration
- DiffArray_Base
- DiffArray_ST
- Code_Setup
- VI_Code
- Code_Real_Approx_By_Float_Fix
- VI_Code_Export_Float
- VI_Code_Export_Rat
- Policy_Iteration
- Splitting_Methods
- Splitting_Methods_Fin
- GS_Code
- GS_Code_Export_Float
- GS_Code_Export_Rat
- Modified_Policy_Iteration
- MPI_Code
- MPI_Code_Export_Float
- MPI_Code_Export_Rat
- Blinfun_To_Matrix
- Policy_Iteration_Fin
- PI_Code
- PI_Code_Export_Float
- PI_Code_Export_Rat
- Backward_Induction
- Fin_Code
- Fin_Code_Export_Float
- Fin_Code_Export_Rat