Audio classification using attention-augmented convolutional neural network

Wu, Yu, Mao, Hua and Yi, Zhang (2018) Audio classification using attention-augmented convolutional neural network. Knowledge-Based Systems, 161. pp. 90-100. ISSN 0950-7051

Wu et al - Audio classification using attention-augmented convolutional neural network AAM.pdf - Accepted Version

Download (936kB) | Preview
Official URL:


Audio classification, as a set of important and challenging tasks, groups speech signals according to speakers’ identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific hand-crafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship among features has not been fully exploited. In this paper, the original speech signal is first represented as spectrogram and later be split along the frequency domain to form frequency-distributed spectrogram. This paper proposes a task-independent model, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels. Further more, an attention mechanism is introduced to systematically enhance the features from certain frequency bands. The proposed FreqCNN is evaluated on three publicly available speech databases thorough three independent classification tasks. The obtained results demonstrate superior performance over the state-of-the-art.

Item Type: Article
Uncontrolled Keywords: Audio classification, Spectrograms, Convolutional neural networks, Attention mechanism
Subjects: G400 Computer Science
Department: Faculties > Engineering and Environment > Computer and Information Sciences
Depositing User: Paul Burns
Date Deposited: 12 Jun 2019 17:08
Last Modified: 01 Aug 2021 11:03

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics