Evolving deep neural networks for image and audio classification using swarm-based optimization

Slade, Samuel Jonathan (2023) Evolving deep neural networks for image and audio classification using swarm-based optimization. Doctoral thesis, Northumbria University.

Text (Doctoral thesis)
slade.samuel_phd(16029067).pdf - Submitted Version

Download (6MB) | Preview


Fine-tuning and designing the architecture of deep learning models to attain optimal performance is a intricate and time-intensive endeavor demanding expertise and multiple iterations. Streamlining this process holds promise, with swarm optimization algorithms emerging as a potential solution. However, these algorithms can be susceptible to local optima and are greatly influenced by their initial conditions, particularly in scenarios involving transfer learning and intricate loss functions, where research remains limited. Moreover, the choice of the most suitable optimization algorithm can be task-specific, necessitating tailored solutions.

To tackle these challenges, the research undertook the following approach:

1. Environmental Particle Swarm Optimisation (EnvPSO) was introduced, a variant of particle swarm optimization, which leverages gradients derived from the anticipated solution space to optimize a multi-stream Convolutional Neural Network (CNN) for human action recognition. It introduces a novel layer strip-back parameter for determining the number of frozen layers during transfer learning.

2. Neural Inference Search (NIS) was then developed, a swarm algorithm that employs neural networks to predict velocities, enhancing the optimization of deep learning segmentation models with multi-loss functions. This approach addresses memory constraints observed in EnvPSO while introducing diverse velocity calculations.

3. Finally Cluster Search Optimisation (CSO) was proposed, a novel approach utilizing clustering of particle positions and historical data for hyper-parameter and architecture search in deep learning models designed for audio emotion classification. This method addresses the issue of randomness in velocity predictions from NIS, incorporating dynamic convergence monitoring. It advances the optimization process by simultaneously optimizing hyper-parameters and architecture.

These research initiatives build upon one another, commencing with EnvPSO, which enhances particle swarm optimization for hyper-parameter fine-tuning, primarily in the context of Human Action Recognition (HAR), by predicting the topology of the search space through convolution. Subsequently, NIS embeds a representation of the search space within a neural network to alleviate memory constraints observed in EnvPSO, introducing a more diverse set of velocity calculaitions. The pinnacle of our work, CSO, focuses on mitigating randomness in velocity predictions from NIS by introducing a clustering-based approach and dynamic convergence monitoring. This approach represents a significant advancement, optimizing both hyper-parameters and architecture.

Our research findings reveal the increased performance of EnvPSO, especially when optimizing an ensemble of multiple CNN streams for still image human action recognition. EnvPSO outperforms state-of-the-art models by a margin of 1.49% on the Willow7 dataset and 1.4% on the BU101 dataset. Additionally, NIS enhances the performance of the Deeplabv3 model, achieving
notable improvements of 4.2%, 0.28%, and 0.3% over the best-performing models on the MESSIDOR, Freiburg Forest, and CamVid datasets, respectively. Furthermore, CSO excels in optimizing CNN-Bidirectional Long Short-Term Memory (BiLSTM) architectures for audio emotion classification, surpassing existing work by substantial margins of 2.9%, 4.4%, and 17.1% on the Emo-DB, SAVEE, and TESS datasets, respectively.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: convolutional neural networks, neural architecture search, hyperparameter optimisation, audio emotion recognition, semantic segmentation
Subjects: G400 Computer Science
Department: Faculties > Engineering and Environment > Computer and Information Sciences
University Services > Graduate School > Doctor of Philosophy
Depositing User: John Coen
Date Deposited: 24 Nov 2023 12:06
Last Modified: 26 Apr 2024 03:31
URI: https://nrl.northumbria.ac.uk/id/eprint/51658

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics