Kinghorn, Philip, Zhang, Li and Shao, Ling (2019) A Hierarchical and Regional Deep Learning Architecture for Image Description Generation. Pattern Recognition Letters, 119. pp. 77-85. ISSN 0167-8655
|
Text
Accepted_manuscript.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0. Download (501kB) | Preview |
Abstract
This research proposes a distinctive deep learning network architecture for image captioning and description generation. Specifically, we propose a hierarchically trained deep network in order to increase the fluidity and descriptive nature of the generated image captions. The proposed deep network consists of initial regional proposal generation and two key stages for image description generation. The initial regional proposal generation is based upon the Region Proposal Network from the Faster R-CNN. This process generates regions of interest that are then used to annotate and classify human and object attributes. The first key stage of the proposed system conducts detailed label description generation for each region of interest. The second stage uses a Recurrent Neural Network (RNN)-based encoder-decoder structure to translate these regional descriptions into a full image description. Especially, the proposed deep network model can label scenes, objects, human and object attributes, simultaneously, which is achieved through multiple individually trained RNNs
The empirical results indicate that our work is comparable to existing research and outperforms state-of-the-art existing methods considerably when evaluated with out-of-domain images from the IAPR TC-12 dataset, especially considering that our system is not trained on images from any of the image captioning datasets. When evaluated with several well-known evaluation metrics, the proposed system achieves an improvement of ∼60% at BLEU-1 over existing methods on the IAPR TC-12 dataset. Moreover, compared with related methods, the proposed deep network requires substantially fewer data samples for training, leading to a much-reduced computational cost.
Item Type: | Article |
---|---|
Subjects: | G400 Computer Science |
Department: | Faculties > Engineering and Environment > Computer and Information Sciences |
Depositing User: | Becky Skoyles |
Date Deposited: | 02 Oct 2017 11:22 |
Last Modified: | 01 Aug 2021 13:05 |
URI: | http://nrl.northumbria.ac.uk/id/eprint/32015 |
Downloads
Downloads per month over past year