Identifying Optimal Features for Multi-channel Acoustic Scene Classification
© 2019 IEEE. Recent approaches to audio classification are typically developed for single channel recordings of acoustic events. In contrast, approaches to acoustic classification of multichannel recordings of domestic audio have not been thoroughly investigated, especially for household recorded acoustic scenes. In this paper, we consider domestic multi-channel audio classification through the use of a Deep Convolutional Neural Network (DCNN) model. The DCNN is applied to cepstral and spectral-based features, including the Mel Frequency Cepstral Coefficients (MFCC), the Power Normalized Cepstral Coefficients (PNCC), and the Log-mel Energies. These features are then compared to our proposed methodology, which involves the use of spectro-temporal features. These features are characterized by the scalogram, and are computed through the Continuous Wavelet Transform (CWT). Further, the evaluation of two different methods of combining and extracting features from the multi-channel recordings has been conducted. Experimental results show that when using the sum of the scalogram features of the four channels, the resulting average classification accuracy (F1-scores) for a range of domestic audio scenes is close to 98%, outperforming the commonly used cepstral and spectral-based methods.