AI CL HCDec 4, 2025

The AI Consumer Index (ACE)

Julien Benchek, Rohit Shetty, Benjamin Hunsberger, Ajay Arun, Zach Richards, Brendan Foody, Osvald Nitski, Bertie Vidgen

arXiv:2512.04921v31 citationsh-index: 28

Originality Synthesis-oriented

AI Analysis

This addresses the problem of assessing AI models' practical utility for consumers, but it is incremental as it introduces a new benchmark without novel methods.

The authors introduced the AI Consumer Index (ACE), a benchmark to evaluate frontier AI models on everyday consumer tasks, finding that the top model scored only 56.1% and models often hallucinated key information like prices, revealing a significant performance gap.

We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform everyday consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) at 55.2% and GPT 5.1 (Thinking = High) at 55.1%. Model scores differ across domains, and in Shopping the top model scores under 50\%. We find that models are prone to hallucinating key information, such as prices. ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.

View on arXiv PDF

Similar