Scopus Harvesting Series

Uncertainty quantification for operators in online reinforcement learning

Bi Wang, Jiangxi University of Science and Technology
Jianqing Wu, Jiangxi University of Science and Technology
Xuelian Li, Nanjing University of Post and TeleCommunications
Jun Shen, University of Wollongong
Yangjun Zhong, Jiangxi University of Science and Technology

Publication Name

Knowledge-Based Systems

Abstract

In online reinforcement learning, operators predict the return by weighting the successors’ estimated value. However, due to the lack of uncertainty quantification, weights assigned by operators are affected by the potentially biased estimations. As a result, the partial order of estimated values is ineffective. To increase the probability of outputting an optimal partial order, this paper introduces the hedonistic expected value (HEV), an upper bound of the return's expectation to quantify the uncertainty. Notably, for compatibility reasons, some complex operators are rewritten as the weighted-sum forms. Based on the weighted-sum form of the operator, the variant Q-learning, namely uncertainty quantification based Q-learning is proposed in this paper. In the proposed algorithm, the weights assigned by HEV of the successors are compatible with the existing operators. The prediction of the return is not only the sum over the weights succeeding the operator but also over the weights following HEV through re-weighting. The greediness of the re-weighted operator is unchanged, and the contraction mapping indicates the convergence can be maintained. We demonstrate that the proposed algorithm with HEV performs favorably in practice.

Open Access Status

This publication is not available as open access

Volume

258

Article Number

109998

Funding Number

2022205200100595

Funding Sponsor

Jiangsu Provincial Department of Education

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1016/j.knosys.2022.109998

Scopus Harvesting Series

Uncertainty quantification for operators in online reinforcement learning

Publication Name

Abstract

Open Access Status

Volume

Article Number

Funding Number

Funding Sponsor

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

Uncertainty quantification for operators in online reinforcement learning

Authors

Publication Name

Abstract

Open Access Status

Volume

Article Number

Funding Number

Funding Sponsor

Share

Link to publisher version (DOI)

Search

Browse

Links