Basile Van Cooten

h-index3
2papers

2 Papers

17.6IRApr 23
From Tokens to Concepts: Leveraging SAE for SPLADE

Yuxuan Zong, Mathias Vast, Basile Van Cooten et al.

Learned Sparse IR models, such as SPLADE, offer an excellent efficiency-effectiveness tradeoff. However, they rely on the underlying backbone vocabulary, which might hinder performance (polysemicity and synonymy) and pose a challenge for multi-lingual and multi-modal usages. To solve this limitation, we propose to replace the backbone vocabulary with a latent space of semantic concepts learned using Sparse Auto-Encoders (SAE). Throughout this paper, we study the compatibility of these 2 concepts, explore training approaches, and analyze the differences between our SAE-SPLADE model and traditional SPLADE models. Our experiments demonstrate that SAE-SPLADE achieves retrieval performance comparable to SPLADE on both in-domain and out-of-domain tasks while offering improved efficiency.

IRJul 19, 2025
Understanding Matching Mechanisms in Cross-Encoders

Mathias Vast, Basile Van Cooten, Laure Soulier et al.

Neural IR architectures, particularly cross-encoders, are highly effective models whose internal mechanisms are mostly unknown. Most works trying to explain their behavior focused on high-level processes (e.g., what in the input influences the prediction, does the model adhere to known IR axioms) but fall short of describing the matching process. Instead of Mechanistic Interpretability approaches which specifically aim at explaining the hidden mechanisms of neural models, we demonstrate that more straightforward methods can already provide valuable insights. In this paper, we first focus on the attention process and extract causal insights highlighting the crucial roles of some attention heads in this process. Second, we provide an interpretation of the mechanism underlying matching detection.