From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices
It addresses the problem of adapting models efficiently on resource-constrained edge devices, offering practical guidance for method selection, though it is incremental as it applies existing PEFT techniques to a new domain.
This paper benchmarks parameter-efficient fine-tuning (PEFT) methods on convolutional neural networks for edge devices, finding that adapter-based PEFT can reduce FLOPs by up to 95% for standard architectures but is only half as memory-efficient for depthwise-separable ones compared to LLMs.
Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.