Dynamic Programming and Optimal Control

Table of Contents:

**The Dynamic Programming Algorithm**- Introduction
- The Basic Problem
- The Dynamic Programming Algorithm
- State Augmentation and Other Reformulations
- Some Mathematical Issues
- Dynamic Programming and Minimax Control
- Notes, Sources, and Exercises

**Deterministic Systems and the Shortest Path Problem**- Finite-State Systems and Shortest Paths
- Some Shortest Path Applications
- Critical Path Analysis
- Hidden Markov Models and the Viterbi Algorithm

- Shortest Path Algorithms
- Label Correcting Methods - A* Algorithm
- Branch-and-Bound
- Constrained and Multiobjective Problems

- Notes, Sources, and Exercises

**Problems with Perfect State Information**- Linear Systems and Quadratic Cost
- Inventory Control
- Dynamic Portfolio Analysis
- Optimal Stopping Problems
- Scheduling and the Interchange Argument
- Set-Membership Description of Uncertainty
- Set-Membership Estimation
- Control with Unknown-but-Bounded Disturbances

- Notes, Sources, and Exercises

**Problems with Imperfect State Information**- Reductions to the Perfect Information Case
- Linear Systems and Quadratic Cost
- Minimum Variance Control of Linear Systems
- Sufficient Statistics and Finite-State Markov Chains
- Sequential Hypothesis Testing
- Notes, Sources, and Exercises

**Introduction to Infinite Horizon Problems**- An Overview
- Stochastic Shortest Path Problems
- Discounted Problems
- Average Cost Problems
- Semi-Markov Problems
- Notes, Sources, and Exercises

**Approximate Dynamic Programming**- Cost Approximation and Limited Lookahead
- Error Bounds and Cost Improvement
- Computation of Suboptimal Policies - Stochastic Programming

- Problem Approximation
- Enforced Decomposition
- Probabilistic Approximation - Certainty Equivalent Control
- Aggregation

- Parametric Cost Approximation
- Feature-Based Architectures and Neural Networks
- Sequential Dynamic Programming Approximation
- Q-Factor Parametric Approximation
- Parametric Approximation in Infinite Horizon Problems
- Computer Chess

- On-Line Approximation and Optimization
- Rollout Algorithms
- Rollout for Discrete Deterministic Problems
- Model Predictive Control
- Open-Loop Feedback Control

- Simulation-Based Cost-to-go Approximation
- Stochastic Rollout and Monte Carlo Tree Search
- Variance Reduction in Rollout

- Approximation in Policy Space
- Adaptive Control
- Discretization Issues
- Notes, Sources, and Exercises

- Cost Approximation and Limited Lookahead
**Deterministic Continuous-Time Optimal Control**- Continuous-Time Optimal Control
- The Hamilton-Jacobi-Bellman Equation
- The Pontryagin Minimum Principle
- An Informal Derivation Using the HJB Equation
- A Derivation Based on Variational Ideas
- The Minimum Principle for Discrete-Time Problems

- Extensions of the Minimum Principle
- Fixed Terminal State
- Free Initial State
- Free Terminal Time
- Time-Varying System and Cost
- Singular Problems

- Notes, Sources, and Exercises

**Appendix A: Mathematical Review****Appendix B: On Optimization Theory****Appendix C: On Probability Theory****Appendix D: On Finite-State Markov Chains****Appendix E: Kalman Filtering****Appendix F: Formulating Problems of Decision Under Uncertainty**

**Discounted Problems - Theory**- Minimization of Total Cost - Introduction
- The Finite-Horizon DP Algorithm
- Shorthand Notation and Monotonicity
- A Preview of Infinite Horizon Results
- The Finite-Horizon DP Algorithm
- Randomized and History-Dependent Policies

- Discounted Problems - Bounded Cost per Stage
- Scheduling and Multiarmed Bandit Problems
- Discounted Continuous-Time Problems
- The Role of Contraction Mappings
- Sup-Norm Contractions
- Discounted Problems - Unbounded Cost per Stage

- General Forms of Discounted Dynamic Programming
- Basic Results Under Contraction and Monotonicity
- Discounted Dynamic Games

- Notes, Sources, and Exercises

- Minimization of Total Cost - Introduction
**Discounted Problems - Computational Methods**- Markovian Decision Problems
- Value Iteration
- Monotonic Error Bounds for Value Iteration
- Variants of Value Iteration
- Q-Learning

- Policy Iteration
- Policy Iteration for Costs
- Policy Iteration for Q-Factors
- Optimistic Policy Iteration
- Limited Lookahead Policies and Rollout

- Linear Programming Methods
- Methods for General Discounted Problems
- Limited Lookahead Policies and Approximations
- Generalized Value Iteration
- Approximate Value Iteration
- Generalized Policy Iteration
- Generalized Optimistic Policy Iteration
- Approximate Policy Iteration
- Mathematical Programming

- Asynchronous Algorithms
- Asynchronous Value Iteration
- Asynchronous Policy Iteration
- Policy Iteration with a Uniform Fixed Point

- Notes, Sources, and Exercises

**Stochastic Shortest Path Problems**- Problem Formulation
- Main Results
- Underlying Contraction Properties
- Value Iteration
- Conditions for Finite Termination
- Asynchronous Value Iteration

- Policy Iteration
- Optimistic Policy Iteration
- Approximate Policy Iteration
- Policy Iteration with Improper Policies
- Policy Iteration with a Uniform Fixed Point

- Countable State Spaces
- Notes, Sources, and Exercises

**Undiscounted Problems**- Unbounded Costs per Stage
- Main Results
- Value Iteration
- Other Computational Methods

- Linear Systems and Quadratic Cost
- Inventory Control
- Optimal Stopping
- Optimal Gambling Strategies
- Nonstationary and Periodic Problems
- Notes, Sources, and Exercises

- Unbounded Costs per Stage
**Average Cost per Stage Problems**- Finite-Spaces Average Cost Models
- Relation with the Discounted Cost Problem
- Blackwell Optimal Policies
- Optimality Equations

- Conditions for Equal Average Cost for all Initial States
- Value Iteration
- Single-Chain Value Iteration
- Multi-Chain Value Iteration

- Policy Iteration
- Single-Chain Policy Iteration
- Multi-Chain Policy Iteration

- Linear Programming
- Infinite-Spaces Problems
- A Sufficient Condition for Optimality
- Finite State Space and Infinite Control Space
- Countable States -- Vanishing Discount Approach
- Countable States -- Contraction Approach
- Linear Systems with Quadratic Cost

- Notes, Sources, and Exercises

- Finite-Spaces Average Cost Models
**Approximate Dynamic Programming - Discounted Models**- General Issues of Simulation-Based Cost Approximation
- Approximation Architectures
- Simulation-Based Approximate Policy Iteration
- Direct and Indirect Approximation
- Monte Carlo Simulation
- Simplifications

- Direct Policy Evaluation - Gradient Methods
- Projected Equation Methods for Policy Evaluation
- The Projected Bellman Equation
- The Matrix Form of the Projected Equation
- Simulation-Based Methods
- LSTD, LSPE, and TD(0) Methods
- Optimistic Versions
- Multistep Simulation-Based Methods
- A Synopsis

- Policy Iteration Issues
- Exploration Enhancement by Geometric Sampling
- Exploration Enhancement by Off-Policy Methods
- Policy Oscillations - Chattering

- Aggregation Methods
- Cost Approximation via the Aggregate Problem
- Cost Approximation via the Enlarged Problem
- Multistep Aggregation
- Asynchronous Distributed Aggregation

- Q-Learning
- Q-Learning: A Stochastic VI Algorithm
- Q-Learning and Policy Iteration
- Q-Factor Approximation and Projected Equations
- Q-Learning for Optimal Stopping Problems
- Q-Learning and Aggregation
- Finite Horizon Q-Learning

- Notes, Sources, and Exercises

- General Issues of Simulation-Based Cost Approximation
**Approximate Dynamic Programming - Nondiscounted Models and Generalizations**- Stochastic Shortest Path Problems
- Average Cost Problems
- Approximate Policy Evaluation
- Approximate Policy Iteration
- Q-Learning for Average Cost Problems

- General Problems and Monte Carlo Linear Algebra
- Projected Equations
- Matrix Inversion and Iterative Methods
- Multistep Methods
- Extension of Q-Learning for Optimal Stopping
- Equation Error Methods
- Oblique Projections
- Generalized Aggregation
- Deterministic Methods for Singular Linear Systems
- Stochastic Methods for Singular Linear Systems

- Approximation in Policy Space
- The Gradient Formula
- Computing the Gradient by Simulation
- Essential Features for Gradient Evaluation
- Approximations in Policy and Value Space

- Notes, Sources, and Exercises

**Appendix A: Measure-Theoretic Issues in Dynamic Programming**- A Two-Stage Example
- Resolution of the Measurability Issues

[Return to Athena Scientific Homepage]

info@athenasc.com