Home
Scholarly Works
Markov Multi-armed Bandit
Chapter

Markov Multi-armed Bandit

Abstract

In many application domains, temporal changes in the reward distribution structure are modeled as a Markov chain. In this chapter, we present the formulation, theoretical bound, and algorithms for the Markov MAB problem, where the rewards are characterized by unknown irreducible Markov processes. Two important classes of the problem are discussed, namely, rested and restless Markov MAB.

Authors

Zheng R; Hua C

Book title

Wireless Networks United Kingdom

Pagination

pp. 27-39

Publication Date

January 1, 2016

DOI

10.1007/978-3-319-50502-2_3
View published work (Non-McMaster Users)

Contact the Experts team