LGAICVNov 25, 2025

On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

arXiv:2511.19986v1
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently deploying large models on resource-constrained edge devices, particularly for multi-task applications like autonomous driving, though it is incremental as it builds on existing sparsity methods.

The paper tackles the problem of high I/O overhead during task switching for sparse large models on edge devices by introducing an on-demand multi-task sparsity framework that maximizes parameter reuse. It accelerates task switching by over 6.6X on average compared to existing methods in experiments on an autonomous driving platform.

Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tasks to maximize overlap. By dynamically loading only the small differential set of blocks required for the next task, our method effectively mitigates the cold-start latency inherent in traditional monolithic approaches.Experiments on a real-world autonomous driving platform demonstrate that our framework achieves superior switching efficiency, accelerating task switching by over 6.6X on average compared to existing sparsity methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes