Verified Algorithms for Solving Markov Decision Processes

Maximilian Schäffeler 📧 and Mohammad Abdulaziz 📧

December 16, 2021

This is a development version of this entry. It might change over time and is not stable. Please refer to release versions for citations.


We present a formalization of algorithms for solving Markov Decision Processes (MDPs) with formal guarantees on the optimality of their solutions. In particular we build on our analysis of the Bellman operator for discounted infinite horizon MDPs. From the iterator rule on the Bellman operator we directly derive executable value iteration and policy iteration algorithms to iteratively solve finite MDPs. We also prove correct optimized versions of value iteration that use matrix splittings to improve the convergence rate. In particular, we formally verify Gauss-Seidel value iteration and modified policy iteration. The algorithms are evaluated on two standard examples from the literature, namely, inventory management and gridworld. Our formalization covers most of chapter 6 in Puterman's book "Markov Decision Processes: Discrete Stochastic Dynamic Programming".
BSD License


Theories of MDP-Algorithms