AICLHCDec 4, 2025

The AI Consumer Index (ACE)

arXiv:2512.04921v31 citationsh-index: 28
Originality Synthesis-oriented
AI Analysis

This addresses the problem of assessing AI models' practical utility for consumers, but it is incremental as it introduces a new benchmark without novel methods.

The authors introduced the AI Consumer Index (ACE), a benchmark to evaluate frontier AI models on everyday consumer tasks, finding that the top model scored only 56.1% and models often hallucinated key information like prices, revealing a significant performance gap.

We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform everyday consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) at 55.2% and GPT 5.1 (Thinking = High) at 55.1%. Model scores differ across domains, and in Shopping the top model scores under 50\%. We find that models are prone to hallucinating key information, such as prices. ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes