AIMar 16

InterPol: De-anonymizing LM Arena via Interpolated Preference Learning

arXiv:2603.1522080.8h-index: 1
Predicted impact top 36% in AI · last 90 daysOriginality Highly original
AI Analysis

This work exposes a severe vulnerability in anonymous leaderboards, which is critical for ensuring the integrity of model evaluations in AI research, though it is incremental as it builds on prior de-anonymization attempts.

The paper tackles the problem of de-anonymizing models on voting-based leaderboards like LM Arena, where strict anonymity is assumed for reliability, by introducing INTERPOL, a model-driven identification framework that learns to distinguish target models using interpolated preference data, achieving significantly higher identification accuracy than existing baselines and demonstrating real-world threat through ranking manipulation simulations.

Strict anonymity of model responses is a key for the reliability of voting-based leaderboards, such as LM Arena. While prior studies have attempted to compromise this assumption using simple statistical features like TF-IDF or bag-ofwords, these methods often lack the discriminative power to distinguish between stylistically similar or within-family models. To overcome these limitations and expose the severity of vulnerability, we introduce INTERPOL, a model-driven identification framework that learns to distinguish target models from others using interpolated preference data. Specifically, INTERPOL captures deep stylistic patterns that superficial statistical features miss by synthesizing hard negative samples through model interpolation and employing an adaptive curriculum learning strategy. Extensive experiments demonstrate that INTERPOL significantly outperforms existing baselines in identification accuracy. Furthermore, we quantify the real-world threat of our findings through ranking manipulation simulations on Arena battle data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes