Recommended Citation
Postprint version. Published in Mathematics of Operations Research, Volume 12, Issue 3, August 1, 1987, pages 463-473.
Abstract
Suppose you are in a casino with a number of dollars you wish to gamble. You may quit whenever you please, and your objective is to find a strategy which will maximize the probability that you reach some goal, say $1000. In formal gambling-theoretic terminology, since there are only a finite number of dollars in the world, and since you may quit and leave whenever you wish, this is a finite-state leavable gambling problem [4], and the classical result of Dubins and Savage [4, Theorem 3.9.2.] says that for each e > 0 there is always a stationary strategy which is uniformly e-optimal. That is, there is always a strategy for betting in which the bet you place at each play depends only on your current fortune, and using this strategy your expected fortune at the time you quit gambling is within e of the most you could expect under any strategy. In general, optimal stationary strategies do not always exist, even in finite-state leavable gambling problems [4, Example 3.9.2.] although they do if the number of bets available for each fortune is also finite [4, Theorem 3.9.1.], an assumption which certainly does not hold in a casino with an oddsmaker (someone who will let you bet any amount on practically any future event - he simply sets odds he considers favourable to the house). An e-optimal stationary strategy is by definition quite good, but it does have the disadvantage that it is not getting any better, and in general always remains e away from optimal at some states.
The purpose of this paper is to introduce the notion of a strategy which is monotonically improving and optimal in the limit, and to prove that such strategies exist in all finite-state leavable gambling problems and in all finite-state Markov decision processes with positive, negative, and discounted pay-offs; in fact even Markov strategies [6] with these properties are shown to exist. The questions of whether monotonically improving limit-optimal (MILO) strategies exist in nonleavable finite-state gambling problems, in finite-state average reward Markov decision processes, or in countable state problems (with various pay-offs) are left open.
Copyright
© 1987 INFORMS
Number of Pages
11
Publisher Statement
https://www.jstor.org/stable/3689977
URL: https://digitalcommons.calpoly.edu/rgp_rsr/94