OSAIPFSEJun 24, 2025

MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

arXiv:2506.19884v11 citationsh-index: 17
Originality Highly original
AI Analysis

This addresses energy efficiency for on-device LLM inference on battery-limited mobile devices, representing a novel system-level solution rather than an incremental improvement.

The paper tackles the problem of high energy consumption during LLM decoding on mobile devices by introducing MNN-AECS, an engine-level system that dynamically selects low-power CPU cores, resulting in an average 23% energy reduction without slowdown compared to the original MNN and up to 78% energy savings against other engines.

As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS) and integrate it into MNN to create the energy-efficient version, MNN-AECS, the first engine-level system solution without requiring root access or OS modifications for energy-efficient LLM decoding. MNN-AECS is designed to reduce LLM decoding energy while keeping decode speed within an acceptable slowdown threshold by dynamically selecting low-power CPU cores. MNN-AECS is evaluated across 5 Android and 2 iOS devices on 5 popular LLMs of various sizes. Compared to original MNN, MNN-AECS cuts down energy use by 23% without slowdown averaged over all 7 devices and 4 datasets. Against other engines, including llama.cpp, executorch, mllm, and MediaPipe, MNN-AECS delivers 39% to 78% energy saving and 12% to 363% speedup on average.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes