Wu, Yu, Mao, Hua and Yi, Zhang (2018) Audio classification using attention-augmented convolutional neural network. Knowledge-Based Systems, 161. pp. 90-100. ISSN 0950-7051
|
Text
Wu et al - Audio classification using attention-augmented convolutional neural network AAM.pdf - Accepted Version Download (936kB) | Preview |
Abstract
Audio classification, as a set of important and challenging tasks, groups speech signals according to speakers’ identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific hand-crafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship among features has not been fully exploited. In this paper, the original speech signal is first represented as spectrogram and later be split along the frequency domain to form frequency-distributed spectrogram. This paper proposes a task-independent model, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels. Further more, an attention mechanism is introduced to systematically enhance the features from certain frequency bands. The proposed FreqCNN is evaluated on three publicly available speech databases thorough three independent classification tasks. The obtained results demonstrate superior performance over the state-of-the-art.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Audio classification, Spectrograms, Convolutional neural networks, Attention mechanism |
Subjects: | G400 Computer Science |
Department: | Faculties > Engineering and Environment > Computer and Information Sciences |
Depositing User: | Paul Burns |
Date Deposited: | 12 Jun 2019 17:08 |
Last Modified: | 01 Aug 2021 11:03 |
URI: | http://nrl.northumbria.ac.uk/id/eprint/39658 |
Downloads
Downloads per month over past year