Spatial and temporal representations for multi-modal visual retrieval

Abstract

This dissertation studies the problem of finding relevant content within a visual collection according to a specific query by addressing three key modalities: symmetric visual retrieval, asymmetric visual retrieval and cross-modal retrieval, depending on the kind of data to be processed. In symmetric visual retrieval, the query object and the elements in the collection are from the same kind of visual data, i.e. images or videos. Inspired by the human visual perception system, we propose new techniques to estimate visual similarity in image-to-image retrieval datasets based on non-metric functions, improving image retrieval performance on top of state-of-the-art methods. On the other hand, asymmetric visual retrieval is the problem in which queries and elements in the dataset are from different types of visual data. We propose methods to aggregate the temporal information of video segments so that imagevideo comparisons can be computed using similarity functions. When compared in image-to-video retrieval datasets, our algorithms drastically reduce memory storage while maintaining high accuracy rates. Finally, we introduce new solutions for cross-modal retrieval, which is the task in which either the queries or the elements in the collection are non-visual objects. In particular, we study text-image retrieval in the domain of art by introducing new models for semantic art understanding, obtaining results close to human performance. Overall, this thesis advances the state-of-the-art in visual retrieval by presenting novel solutions for some of the key tasks in the field. The contributions derived from this work have potential direct applications in the era of big data, as visual datasets are growing exponentially every day and new techniques for storing, accessing and managing large-scale visual collections are required.

Divisions: Aston University (General)
Additional Information: If you have discovered material in Aston Research Explorer which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately.
Institution: Aston University
Uncontrolled Keywords: image retrieval,video retrieval,cross-modal retrieval
Last Modified: 08 Dec 2023 08:56
Date Deposited: 16 Mar 2020 09:51
Completed Date: 2019-03-25
Authors: Garcia Docampo, Noa

Export / Share Citation


Statistics

Additional statistics for this record