Reinforcement-Mining: Protecting Reward in Selfish Mining

Publication Name

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


Selfish mining is notorious for receiving additional rewards disproportionate to the attacker’s mining power in Proof-of-Work (PoW) consensus-based blockchain, e.g., Bitcoin. Unfair reward distribution may cause partial honest miners to quit blockchain mining, which will seriously weaken the security of the PoW blockchain since the security is guaranteed by strong mining power. Various efforts have been proposed to alleviate this problem, but are generally expensive to implement, e.g., upgrading the blockchain backbone protocol. In this work, we propose a method, named Reinforcement-Mining, to protect honest miners’ mining rewards to mitigate the harm of selfish mining. The key insight of Reinforcement-Mining is to employ a deep reinforcement learning framework to choose the optimal policy for honest miners to protect their rewards when the blockchain suffers from a selfish mining attack. Experiments on mining reward and chain quality property are conducted respectively. The analysis of experiment results demonstrates that our approach moderates the unfair reward distribution of selfish mining and improves the chain quality property of the blockchain. The proposed method may be still far from practical application, however, it provides a new perspective for defense against selfish mining.

Open Access Status

This publication is not available as open access


13600 LNCS

First Page


Last Page




Link to publisher version (DOI)