Training for Compositional Sensitivity Reduces Dense Retrieval Generalization
This addresses a critical limitation in dense retrieval systems for information retrieval tasks, though it is incremental in improving robustness to compositional changes.
The paper tackles the brittleness of dense retrieval embeddings to compositional edits like negation and role swaps, finding that training to improve sensitivity to such edits reduces zero-shot retrieval performance by 8-9% to 40% on benchmarks, while a small Transformer over similarity maps effectively separates near-misses.
Dense retrieval compresses texts into single embeddings ranked by cosine similarity. While efficient for recall, this interface is brittle for identity-level matching: minimal compositional edits (negation, role swaps) flip meaning yet retain high similarity. Motivated by geometric results for unit-sphere cosine spaces (Kang et al., 2025), we test this retrieval-composition tension in text-only retrieval. Across four dual-encoder backbones, adding structure-targeted negatives consistently reduces zero-shot NanoBEIR retrieval (8-9% mean nDCG@10 drop on small backbones; up to 40% on medium ones), while only partially improving pooled-space separation. Treating pooled cosine as a recall interface, we then benchmark verifiers scoring token--token cosine maps. MaxSim (late interaction) excels at reranking but fails to reject structural near-misses, whereas a small Transformer over similarity maps reliably separates near-misses under end-to-end training.