CVIRMMNov 6, 2020

Deep Cross-modal Hashing via Margin-dynamic-softmax Loss

arXiv:2011.03451v2
AI Analysis

This work addresses cross-modal retrieval for multimedia applications, offering a novel approach that is incremental but improves upon existing supervised methods.

The paper tackles the problem of cross-modal hashing for efficient search by proposing a method that avoids defining similarity between datapoints, using a novel margin-dynamic-softmax loss to directly utilize proxy hash codes, resulting in improved retrieval performance with concrete gains reported in experiments.

Due to their high retrieval efficiency and low storage cost for cross-modal search task, cross-modal hashing methods have attracted considerable attention. For the supervised cross-modal hashing methods, how to make the learned hash codes preserve semantic information sufficiently contained in the label of datapoints is the key to further enhance the retrieval performance. Hence, almost all supervised cross-modal hashing methods usually depends on defining a similarity between datapoints with the label information to guide the hashing model learning fully or partly. However, the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information, then hinders the further improvement of retrieval performance. Thus, in this paper, different from previous works, we propose a novel cross-modal hashing method without defining the similarity between datapoints, called Deep Cross-modal Hashing via \textit{Margin-dynamic-softmax Loss} (DCHML). Specifically, DCHML first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code, called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next, without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks , we propose a novel \textit{margin-dynamic-softmax loss} to directly utilize the proxy hashing codes as supervised information. Finally, by minimizing the novel \textit{margin-dynamic-softmax loss}, the modality-specific hashing networks can be trained to generate hash codes which can simultaneously preserve the cross-modal similarity and abundant semantic information well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes