Shen, Yuming, Zhang, Li and Shao, Ling (2017) Semi-supervised vision-language mapping via variational learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Piscataway, pp. 1349-1354. ISBN 978-1-5090-4634-8
Full text not available from this repository. (Request a copy)Abstract
Understanding the semantic relations between vision and language data has become a research trend in artificial intelligence and robotic systems. The lack of training data is an essential issue for vision-language understanding. We address the problem of image and sentence cross-modal retrieval when paired training samples are not sufficient. Inspired by recent works in variational inference, in this paper, the autoencoding variational Bayes framework is novelly extended to a semi-supervised model for image-sentence mapping task. Our method does not require all training images and sentences to be paired. The proposed model is an end-to-end system, and consists of a two-level variational embedding structure where unpaired data are involved in the first level embedding to give support to intra-modality statistics so that the lower bound of the joint marginal likelihood of paired data embeddings can be better approximated. The proposed retrieval model is evaluated on two popular datasets, i.e. Flickr30K and Flickr8K, producing superior performances compared with related state-of-the-art methods.
Item Type: | Book Section |
---|---|
Subjects: | G700 Artificial Intelligence |
Department: | Faculties > Engineering and Environment > Computer and Information Sciences |
Depositing User: | Becky Skoyles |
Date Deposited: | 05 Sep 2017 08:45 |
Last Modified: | 12 Oct 2019 20:46 |
URI: | http://nrl.northumbria.ac.uk/id/eprint/31737 |
Downloads
Downloads per month over past year