Acoustic scene classification: from a hybrid classifier to deep learning
Title: Acoustic scene classification: from a hybrid classifier to deep learning
Authors: Anastasios Vafeiadis, Dimitrios Kalatzis, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming Chen and Raouf Hamzaoui
This paper describes our contribution to the 2017 Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. We investigated two approaches for the acoustic scene classification task. Firstly, we used a combination of features in the time and frequency domain and a hybrid Support Vector Machines - Hidden Markov Model (SVM-HMM) classifier to achieve an average accuracy over 4-folds of 80.9% on the development dataset and 61.0% on the evaluation dataset. Secondly, by exploiting dataaugmentation techniques and using the whole segment (as opposed to splitting into sub-sequences) as an input, the accuracy of our CNN system was boosted to 95.9%. However, due to the small number of kernels used for the CNN and a failure of capturing the global information of the audio signals, it achieved an accuracy of 49.5% on the evaluation dataset. Our two approaches outperformed the DCASE baseline method, which uses log-mel band energies for feature extraction and a Multi-Layer Perceptron (MLP) to achieve an average accuracy over 4-folds of 74.8%.
DCASE IEEE Audio and Acoustic Signal Processing (AASP) challenge and workshop
16-17 November 2017