PF AI HC NIMar 10, 2025

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices

arXiv:2504.00002v19 citationsh-index: 5Proceedings of the 2nd International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things

Originality Synthesis-oriented

AI Analysis

This study addresses the problem of efficiency constraints for deploying LLMs on resource-limited mobile devices, providing insights for system designers, though it is incremental as it focuses on measurement rather than new solutions.

The paper conducted a measurement study to evaluate efficiency tradeoffs for deploying large language models (LLMs) on mobile devices, finding that only small models (<4B parameters) can run on-device with significant latency (>30 seconds) and quality limitations compared to cloud-based deployments (<10 seconds).

Recent advancements in large language models (LLMs) have prompted interest in deploying these models on mobile devices to enable new applications without relying on cloud connectivity. However, the efficiency constraints of deploying LLMs on resource-limited devices present significant challenges. In this paper, we conduct a comprehensive measurement study to evaluate the efficiency tradeoffs between mobile-based, edge-based, and cloud-based deployments for LLM applications. We implement AutoLife-Lite, a simplified LLM-based application that analyzes smartphone sensor data to infer user location and activity contexts. Our experiments reveal that: (1) Only small-size LLMs (<4B parameters) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models; (2) Model compression is effective in lower the hardware requirement, but may lead to significant performance degradation; (3) The latency to run LLMs on mobile devices with meaningful output is significant (>30 seconds), while cloud services demonstrate better time efficiency (<10 seconds); (4) Edge deployments offer intermediate tradeoffs between latency and model capabilities, with different results on CPU-based and GPU-based settings. These findings provide valuable insights for system designers on the current limitations and future directions for on-device LLM applications.

View on arXiv PDF

Similar