DCAILGOSJun 21, 2025

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

UW
arXiv:2506.17538v11 citationsh-index: 27Has Code
Originality Incremental advance
AI Analysis

This addresses benchmarking needs for developers and system designers deploying GenAI on consumer devices, though it is incremental as it builds on existing benchmarking concepts with new scenarios.

The paper tackles the challenge of evaluating Generative AI applications on end-user devices by introducing ConsumerBench, a benchmarking framework that simulates realistic multi-application scenarios on constrained hardware. It reveals inefficiencies in resource sharing and unfair scheduling, providing insights like custom kernels and SLO-aware scheduling to improve performance.

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes