Towards Minimising Perturbation Rate for Adversarial Machine Learning with Pruning
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Deep neural networks can be potentially vulnerable to adversarial samples. For example, by introducing tiny perturbations in the data sample, the model behaviour may be significantly altered. While the adversarial samples can be leveraged to enhance the model robustness and performance with adversarial training, one critical attribute of the adversarial samples is the perturbation rate. A lower perturbation rate means a smaller difference between the adversarial and the original sample. It results in closer features learnt from the model for the adversarial and original samples, resulting in higher-quality adversarial samples. How to design a successful attacking algorithm with a minimum perturbation rate remains challenging. In this work, we consider pruning algorithms to dynamically minimise the perturbation rate for adversarial attacks. In particularly, we propose, for the first time, an attribution based perturbation reduction method named Min-PR for white-box adversarial attacks. The comprehensive experiment results demonstrate Min-PR can achieve minimal perturbation rates of adversarial samples while providing guarantee to train robust models. The code in this paper is available at: https://github.com/LMBTough/Min-PR.
Open Access Status
This publication is not available as open access