Scalogram Neural Network Activations with Machine Learning for Domestic Multi-channel Audio Classification
© 2019 IEEE. Current methodologies explored for audio classification, particularly multi-channel audio, commonly involve the use of individual deep learning approaches. In this paper, we look at domestic multi-channel audio classification through a comparison of various combinations of existing pre-trained Neural Network (NN) models, with Support Vector Machine (SVM) for classification. The NN model is first trained with spectro-temporal features extracted from the audio, characterized by scalogram images that are generated through the Continuous Wavelet Transform (CWT). Activations that are extracted from the selected layer of the concerned neural network model, are then sent as features used to train the machine learning approach for classification. Utilization of the network activations learnt from the deep learning component of the classifier strengthens the time-frequency features of the signal that are extracted from the spectrogram. This therefore allows further improvement to the accuracy. For the full SINS development database, best results yielded an F1-score of over 97% for the tenth layer of the Xception network when combined with the multi-class Linear SVM, showing a drastic improvement from the top performing F1-score achieved in the DCASE 2018 Task 5 challenge, which rests at around 89%.