Identifying Models Behind Text-to-Image Leaderboards
This exposes security flaws in leaderboards used for comparing AI-generated images, impacting fairness and trust in AI evaluation systems.
The study tackled the problem of breaking anonymity in text-to-image leaderboards by showing that model outputs form distinctive clusters in embedding space, enabling accurate deanonymization with high accuracy on 22 models and 280 prompts (150K images).
Text-to-image (T2I) models are increasingly popular, producing a large share of AI-generated images online. To compare model quality, voting-based leaderboards have become the standard, relying on anonymized model outputs for fairness. In this work, we show that such anonymity can be easily broken. We find that generations from each T2I model form distinctive clusters in the image embedding space, enabling accurate deanonymization without prompt control or training data. Using 22 models and 280 prompts (150K images), our centroid-based method achieves high accuracy and reveals systematic model-specific signatures. We further introduce a prompt-level distinguishability metric and conduct large-scale analyses showing how certain prompts can lead to near-perfect distinguishability. Our findings expose fundamental security flaws in T2I leaderboards and motivate stronger anonymization defenses.