Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval


We address the problem of image-to-video retrieval. Given a query image, the aim is to identify the frame or scene within a collection of videos that best matches the visual input. Matching images to videos is an asymmetric task in which specific features for capturing the visual information in images and, at the same time, compacting the temporal correlation from videos are needed. Methods proposed so far are based on the temporal aggregation of hand-crafted features. In this work, we propose a deep learning architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo retrieval. Our method learns non-linear projections from training data for both images and videos and projects their visual content into a common latent space, where they can be easily compared with a standard similarity function. Experiments conducted here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art in standard image-to-video retrieval datasets.

Divisions: College of Engineering & Physical Sciences > School of Informatics and Digital Engineering > Computer Science
College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
Additional Information: © 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms
Event Type: Other
Event Dates: 2018-09-03 - 2018-09-06
PURE Output Type: Conference contribution
Published Date: 2018-09-06
Accepted Date: 2018-07-02
Authors: Garcia, Noa
Vogiatzis, George (ORCID Profile 0000-0002-3226-0603)



Version: Published Version

| Preview

Export / Share Citation


Additional statistics for this record