Reinforcement learning based monotonic policy for online resource allocation

Publication Name

Future Generation Computer Systems

Abstract

This research aims to design an optimal and strategyproof mechanism for online resource allocation problems. In such problems, consumers randomly arrive with their resource requests in an arbitrary manner. As a result, there is uncertainty in the future resource demands. In addition, the allocation and payment decisions depend on the providers’ past experiences. To address these challenges, we propose a novel reinforcement learning algorithm for optimising the resource allocation policy. The proposed algorithm adopts a novel monotonic reward shaping function that uses a dominant-resource multi-label classification technique. Finally, a critical payment value is calculated in order to maintain the strategyproofness in the online environment. The experimental evaluations show that the proposed mechanism achieves results that are within 96% of the optimal social welfare while outperforming the other mechanisms that use fixed pricing.

Open Access Status

This publication is not available as open access

Volume

138

First Page

313

Last Page

327

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1016/j.future.2021.09.023