Damianos Galanopoulos

h-index2

2papers

6citations

2 Papers

1.4CVNov 21, 2022Code

Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval

Damianos Galanopoulos, Vasileios Mezaris

In this paper we tackle the cross-modal video retrieval problem and, more specifically, we focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse textual and visual features into feature pairs that lead to generating multiple joint feature spaces, which encode text-video pairs into comparable representations. To learn these representations our proposed network architecture is trained by following a multiple space learning procedure. Moreover, at the retrieval stage, we introduce additional softmax operations for revising the inferred query-video similarities. Extensive experiments in several setups based on three large-scale datasets (IACC.3, V3C1, and MSR-VTT) lead to conclusions on how to best combine text-visual features and document the performance of the proposed network. Source code is made publicly available at: https://github.com/bmezaris/TextToVideoRetrieval-TtimesV

1.6IRNov 13, 2020

Elejalde Erick, Galanopoulos Damianos, Niederee Claudia et al.

Migration, and especially irregular migration, is a critical issue for border agencies and society in general. Migration-related situations and decisions are influenced by various factors, including the perceptions about migration routes and target countries. An improved understanding of such factors can be achieved by systematic automated analyses of media and social media channels, and the videos and images published in them. However, the multifaceted nature of migration and the variety of ways migration-related aspects are expressed in images and videos make the finding and automated analysis of migration-related multimedia content a challenging task. We propose a novel approach that effectively bridges the gap between a substantiated domain understanding - encapsulated into a set of Migration-related semantic concepts - and the expression of such concepts in a video, by introducing an advanced video analysis and retrieval method for this purpose.