Postprint version. Published in Stochastic Processes and Their Applications, Volume 17, Issue 2, July 1, 1984, pages 349-357.
Copyright © 1984 Elsevier.
NOTE: At the time of publication, the author Theodore P. Hill was not yet affiliated with Cal Poly.
The definitive version is available at https://doi.org/10.1016/0304-4149(84)90010-3.
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.