IRCVMay 28

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking

arXiv:2605.2928721.2h-index: 6
Predicted impact top 44% in IR · last 90 daysOriginality Incremental advance
AI Analysis

For content platforms needing efficient and accurate I2I retrieval, UniNote addresses the trade-offs between representation quality and serving latency in industrial settings.

UniNote proposes a unified embedding model for industrial Item-to-Item retrieval, achieving SOTA performance across diverse tasks and significant improvements in retrieval quality and cost efficiency when deployed at Xiaohongshu.

Item-to-Item (I2I) retrieval is a fundamental part of modern content platforms, supporting critical industrial workflows from recommendation engines to content auditing. While multimodal embedding methods have advanced general retrieval, they often falter in I2I scenarios due to the challenges of balancing global content representation with fine-grained local retrieval, the systemic inefficiency of decoupled embedding-and-ranking pipelines, and the inherent trade-offs between model precision and serving latency. To solve these issues, we propose \textbf{UniNote}, a unified embedding model designed for industrial I2I retrieval. Tailored retrieval strategies are introduced to support representation learning over complex, multimodal content at varying granularities. To operationalize these strategies, UniNote employs a two-stage training paradigm: the first stage leverages contrastive SFT to establish robust base embeddings, while the second stage refines ranking quality through a reinforcement learning (RL) process that aligns the model with content relevance. Our results show that UniNote achieves SOTA performance across diverse I2I tasks. Deployed at Xiaohongshu and integrated with Matryoshka Representation Learning (MRL), UniNote achieved significant improvements in retrieval quality and cost efficiency in large-scale applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes