Home
Scholarly Works
A novel double-mGBDT-based Q-learning
Journal article

A novel double-mGBDT-based Q-learning

Abstract

This paper proposes a novel double-mGBDT-based Q-learning algorithm. Compared with traditional deep reinforcement learning, the proposed algorithm uses the mGBDT to replace the DNN, where the mGBDT is introduced as the function approximator. In the learning process, based on the state, we use the Bellman equation to construct the target value, which is used to train the mGBDT in an online manner. Like DQN, we also adopt two mGBDT frameworks, which are used to address the problem of easy divergence. To verify performance, we apply the proposed algorithm DQN and mGBDT to the traditional benchmark problems in CartPole and MountainCar. The results show that the proposed algorithm can converge to the optimal policy, and compared with DQN, the proposed algorithm's stability is much better after convergence.

Authors

Fu Q; Ma S; Tian D; Chen J; Gao Z; Zhong S

Journal

International Journal of Modelling Identification and Control, Vol. 37, No. 3-4, pp. 232–239

Publisher

Inderscience Publishers

Publication Date

January 1, 2021

DOI

10.1504/ijmic.2021.121827

ISSN

1746-6172
View published work (Non-McMaster Users)

Contact the Experts team