Dual Network DQN Algorithm Based on Second-order...

Dual Network DQN Algorithm Based on Second-order Temporal Difference Error

Abstract

Aiming at the problem of poor convergence stability caused by overestimation of Depth Q-Network(DQN) algorithm, on the basis of traditional Temporal Difference (TD), the concept of n-order TD error is proposed and a dual¬network DQN algorithm based on second-order TD error is designed. A value function updating formula based on second-order TD error is constructed. Meanwhile, a two-network model is established in combination with DQN algorithm, and two isomorphic value function networks are obtained, whichd are respectively used to represent the value functions of two successive rounds, and the network parameters are cooperatively updated to improve the stability of value function estimation in DQN algorithm. Experimental results based on the Open AI Gym platform show that, the proposed algorithm has better convergence stability compared with the classical DQN algorithm in solving the Mountain Car and Cart Pole problems.

Authors

Chen J; Zhou X; Fu Q; Gao Z; Fu B; Wu H

Journal

Jisuanji Gongcheng Computer Engineering, Vol. 46, No. 5, pp. 78–93

Publication Date

January 1, 2020

DOI

10.19678/j.issn.1000-3428.0054557

ISSN

1000-3428

Associated Experts

Zhen Gao

Associate Professor, Faculty of Engineering

Visit profile

View published work (Non-McMaster Users)

Contact the Experts team

Get technical help

Provide website feedback