Scopus Harvesting Series

Reinforcement learning based monotonic policy for online resource allocation

Pankaj Mishra, Nagoya Institute of Technology
Ahmed Moustafa, Nagoya Institute of Technology

Publication Name

Future Generation Computer Systems

Abstract

This research aims to design an optimal and strategyproof mechanism for online resource allocation problems. In such problems, consumers randomly arrive with their resource requests in an arbitrary manner. As a result, there is uncertainty in the future resource demands. In addition, the allocation and payment decisions depend on the providers’ past experiences. To address these challenges, we propose a novel reinforcement learning algorithm for optimising the resource allocation policy. The proposed algorithm adopts a novel monotonic reward shaping function that uses a dominant-resource multi-label classification technique. Finally, a critical payment value is calculated in order to maintain the strategyproofness in the online environment. The experimental evaluations show that the proposed mechanism achieves results that are within 96% of the optimal social welfare while outperforming the other mechanisms that use fixed pricing.

Open Access Status

This publication is not available as open access

Volume

138

First Page

313

Last Page

327

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1016/j.future.2021.09.023

Scopus Harvesting Series

Reinforcement learning based monotonic policy for online resource allocation

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

Reinforcement learning based monotonic policy for online resource allocation

Authors

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Browse

Links