Markov Multi-armed Bandit

Abstract

In many application domains, temporal changes in the reward distribution structure are modeled as a Markov chain. In this chapter, we present the formulation, theoretical bound, and algorithms for the Markov MAB problem, where the rewards are characterized by unknown irreducible Markov processes. Two important classes of the problem are discussed, namely, rested and restless Markov MAB.

Authors

Zheng R; Hua C

Book title

Wireless Networks United Kingdom

Pagination

pp. 27-39

Publication Date

January 1, 2016

DOI

10.1007/978-3-319-50502-2_3

Associated Experts

Rong Zheng

Professor, Faculty of Engineering

Visit profile

View published work (Non-McMaster Users)

Contact the Experts team

Get technical help

Provide website feedback