Order-Preserving Dimension Reduction for Multimodal Semantic Embedding
This addresses efficiency challenges in time-sensitive vision applications by enabling faster multimodal retrieval, though it is incremental as it builds on existing dimension-reduction techniques.
The paper tackles the computational expense of k-nearest neighbor search in multimodal data by proposing Order-Preserving Dimension Reduction (OPDR) to reduce embedding dimensionality while preserving ranking accuracy, achieving high recall with significantly lower costs.
Searching for the $k$-nearest neighbors (KNN) in multimodal data retrieval is computationally expensive, particularly due to the inherent difficulty in comparing similarity measures across different modalities. Recent advances in multimodal machine learning address this issue by mapping data into a shared embedding space; however, the high dimensionality of these embeddings (hundreds to thousands of dimensions) presents a challenge for time-sensitive vision applications. This work proposes Order-Preserving Dimension Reduction (OPDR), aiming to reduce the dimensionality of embeddings while preserving the ranking of KNN in the lower-dimensional space. One notable component of OPDR is a new measure function to quantify KNN quality as a global metric, based on which we derive a closed-form map between target dimensionality and key contextual parameters. We have integrated OPDR with multiple state-of-the-art dimension-reduction techniques, distance functions, and embedding models; experiments on a variety of multimodal datasets demonstrate that OPDR effectively retains recall high accuracy while significantly reducing computational costs.