Abstract

In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.

Disciplines

Mathematics

Included in

Mathematics Commons

COinS
 

URL: https://digitalcommons.calpoly.edu/rgp_rsr/55