Semi-supervised vision-language mapping via variational learning

Shen, Yuming, Zhang, Li and Shao, Ling (2017) Semi-supervised vision-language mapping via variational learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Piscataway, pp. 1349-1354. ISBN 978-1-5090-4634-8

Full text not available from this repository. (Request a copy)
Official URL:


Understanding the semantic relations between vision and language data has become a research trend in artificial intelligence and robotic systems. The lack of training data is an essential issue for vision-language understanding. We address the problem of image and sentence cross-modal retrieval when paired training samples are not sufficient. Inspired by recent works in variational inference, in this paper, the autoencoding variational Bayes framework is novelly extended to a semi-supervised model for image-sentence mapping task. Our method does not require all training images and sentences to be paired. The proposed model is an end-to-end system, and consists of a two-level variational embedding structure where unpaired data are involved in the first level embedding to give support to intra-modality statistics so that the lower bound of the joint marginal likelihood of paired data embeddings can be better approximated. The proposed retrieval model is evaluated on two popular datasets, i.e. Flickr30K and Flickr8K, producing superior performances compared with related state-of-the-art methods.

Item Type: Book Section
Subjects: G700 Artificial Intelligence
Department: Faculties > Engineering and Environment > Computer and Information Sciences
Depositing User: Becky Skoyles
Date Deposited: 05 Sep 2017 08:45
Last Modified: 12 Oct 2019 20:46

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics