Abstract
A partially observable Markov decision process (POMDP) is an appropriate mathematical modeling tool for dynamic stochastic systems where portions or all of the system states are not completely observable to the decision maker. In this respect, POMDPs generalize completely observable Markov decision processes (MDPs) by allowing infinitely many states to address partial observability. However, the resulting models suffer tremendously from computational intractability even for relatively small problems. Therefore, POMDPs are frequently approximated by solving variants of completely observable MDPs defined on a grid of finite states. This article summarizes the relationships between completely and partially observable MDPs and derives inequalities for the POMDP value function using the optimal value function of the grid-based MDPs.
| Original language | English |
|---|---|
| Title of host publication | Wiley Encyclopedia of Operations Research and Management Science |
| Publisher | wiley |
| Pages | 1-9 |
| Number of pages | 9 |
| ISBN (Electronic) | 9780470400531 |
| ISBN (Print) | 9780470400630 |
| DOIs | |
| Publication status | Published - 1 Jan 2010 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2010 John Wiley & Sons, Inc. All rights reserved.
Keywords
- grid-based approximation
- linear programming
- MDP
- POMDP
Fingerprint
Dive into the research topics of 'Reduction of a POMDP to an MDP'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver