Markov Decision Processes with Rewards

 

Title: Markov Decision Processes with Rewards
Authors: Maximilian Schäffeler (schaeffm /at/ in /dot/ tum /dot/ de) and Mohammad Abdulaziz
Submission date: 2021-12-16
Abstract: We present a formalization of Markov Decision Processes with rewards. In particular we first build on Hölzl's formalization of MDPs (AFP entry: Markov_Models) and extend them with rewards. We proceed with an analysis of the expected total discounted reward criterion for infinite horizon MDPs. The central result is the construction of the iteration rule for the Bellman operator. We prove the optimality equations for this operator and show the existence of an optimal stationary deterministic solution. The analysis can be used to obtain dynamic programming algorithms such as value iteration and policy iteration to solve MDPs with formal guarantees. Our formalization is based on chapters 5 and 6 in Puterman's book "Markov Decision Processes: Discrete Stochastic Dynamic Programming".
BibTeX:
@article{MDP-Rewards-AFP,
  author  = {Maximilian Schäffeler and Mohammad Abdulaziz},
  title   = {Markov Decision Processes with Rewards},
  journal = {Archive of Formal Proofs},
  month   = dec,
  year    = 2021,
  note    = {\url{https://isa-afp.org/entries/MDP-Rewards.html},
            Formal proof development},
  ISSN    = {2150-914x},
}
License: BSD License
Used by: MDP-Algorithms
Status: [ok] This is a development version of this entry. It might change over time and is not stable. Please refer to release versions for citations.