Black-Box Audio Adversarial Example Generation Using Variational Autoencoder

Publication Name

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.

Open Access Status

This publication is not available as open access


12919 LNCS

First Page


Last Page




Link to publisher version (DOI)