An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images

Slade, Samuel, Zhang, Li, Yu, Yonghong and Lim, Chee Peng (2022) An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images. Neural Computing and Applications, 34 (11). pp. 9205-9231. ISSN 0941-0643

[img]
Preview
Text (Final published version)
Slade2022_Article_AnEvolvingEnsembleModelOfMulti.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (2MB) | Preview
[img]
Preview
Text (Advance online version)
Advance online version.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (2MB) | Preview
Official URL: https://doi.org/10.1007/s00521-022-06947-6

Abstract

Still image human action recognition (HAR) is a challenging problem owing to limited sources of information and large intra-class and small inter-class variations which requires highly discriminative features. Transfer learning offers the necessary capabilities in producing such features by preserving prior knowledge while learning new representations. However, optimally identifying dynamic numbers of re-trainable layers in the transfer learning process poses a challenge. In this study, we aim to automate the process of optimal configuration identification. Specifically, we propose a novel particle swarm optimisation (PSO) variant, denoted as EnvPSO, for optimal hyper-parameter selection in the transfer learning process with respect to HAR tasks with still images. It incorporates Gaussian fitness surface prediction and exponential search coefficients to overcome stagnation. It optimises the learning rate, batch size, and number of re-trained layers of a pre-trained convolutional neural network (CNN). To overcome bias of single optimised networks, an ensemble model with three optimised CNN streams is introduced. The first and second streams employ raw images and segmentation masks yielded by mask R-CNN as inputs, while the third stream fuses a pair of networks with raw image and saliency maps as inputs, respectively. The final prediction results are obtained by computing the average of class predictions from all three streams. By leveraging differences between learned representations within optimised streams, our ensemble model outperforms counterparts devised by PSO and other state-of-the-art methods for HAR. In addition, evaluated using diverse artificial landscape functions, EnvPSO performs better than other search methods with statistically significant difference in performance.

Item Type: Article
Additional Information: Funding Information: This work was supported in part by European Regional Development Fund (ERDF) and in part by RPPTV Ltd. for jointly funding a Ph.D. studentship via the Intensive Industrial Innovation Programme North East (IIIPNE, Grant No. 25R17P01847).
Uncontrolled Keywords: Convolutional neural network, Ensemble model, Human action recognition, Hyper-parameter optimisation, Object detection and classification
Subjects: G400 Computer Science
G700 Artificial Intelligence
Department: Faculties > Engineering and Environment > Computer and Information Sciences
Depositing User: Rachel Branson
Date Deposited: 09 Feb 2022 14:15
Last Modified: 30 May 2022 10:30
URI: http://nrl.northumbria.ac.uk/id/eprint/48426

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics