CVMar 29, 2024

Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter

Yuiko Sakuma, Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

arXiv:2403.20080v13.71 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses the challenge of deploying large vision models on various hardware with incremental improvements in memory efficiency and search space optimization.

The paper tackled the problem of compressing vision foundation models for efficient deployment by proposing a method to fine-tune them into mixed-precision quantized supernets, achieving about a 95% reduction in bit-wise operations without performance degradation.

Compression of large and performant vision foundation models (VFMs) into arbitrary bit-wise operations (BitOPs) allows their deployment on various hardware. We propose to fine-tune a VFM to a mixed-precision quantized supernet. The supernet-based neural architecture search (NAS) can be adopted for this purpose, which trains a supernet, and then subnets within arbitrary hardware budgets can be extracted. However, existing methods face difficulties in optimizing the mixed-precision search space and incurring large memory costs during training. To tackle these challenges, first, we study the effective search space design for fine-tuning a VFM by comparing different operators (such as resolution, feature size, width, depth, and bit-widths) in terms of performance and BitOPs reduction. Second, we propose memory-efficient supernet training using a low-rank adapter (LoRA) and a progressive training strategy. The proposed method is evaluated for the recently proposed VFM, Segment Anything Model, fine-tuned on segmentation tasks. The searched model yields about a 95% reduction in BitOPs without incurring performance degradation.

View on arXiv PDF

Similar