LGPFNov 17, 2025

Hardware optimization on Android for inference of AI models

arXiv:2511.13453v1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient AI deployment on mobile devices to enhance user experience, though it is incremental as it applies existing methods like quantization and hardware acceleration to specific models.

The paper tackled the problem of optimizing AI model inference on Android by evaluating execution configurations for object detection and image classification tasks, resulting in empirical determination of the best trade-off between minimal accuracy degradation and maximal inference speed-up.

The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Optimizing the mobile user experience involves minimal latency and high responsiveness from deployed AI models with challenges from execution strategies that fully leverage real time constraints to the exploitation of heterogeneous hardware architecture. In this paper, we research and propose the optimal execution configurations for AI models on an Android system, focusing on two critical tasks: object detection (YOLO family) and image classification (ResNet). These configurations evaluate various model quantization schemes and the utilization of on device accelerators, specifically the GPU and NPU. Our core objective is to empirically determine the combination that achieves the best trade-off between minimal accuracy degradation and maximal inference speed-up.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes