Multimodal similarity-preserving hashing
This addresses the challenge of efficient cross-modal retrieval for multimedia applications, representing an incremental advancement with a novel neural network architecture.
The paper tackles the problem of hashing multimodal data into a comparable representation space, achieving significant performance improvements over state-of-the-art hashing methods on multimedia retrieval tasks.
We introduce an efficient computational framework for hashing data belonging to multiple modalities into a single representation space where they become mutually comparable. The proposed approach is based on a novel coupled siamese neural network architecture and allows unified treatment of intra- and inter-modality similarity learning. Unlike existing cross-modality similarity learning approaches, our hashing functions are not limited to binarized linear projections and can assume arbitrarily complex forms. We show experimentally that our method significantly outperforms state-of-the-art hashing approaches on multimedia retrieval tasks.