CVAINov 16, 2025

MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting

arXiv:2511.12400v1
Originality Highly original
AI Analysis

This addresses the need for efficient adaptation across architectures like CNNs and ViTs, offering a universal approach with strong generalization.

The paper tackles the problem of parameter-efficient adaptation for vision backbones, introducing MSLoRA, which improves transfer performance on classification, detection, and segmentation tasks with less than 5% of backbone parameters.

We introduce MSLoRA, a backbone-agnostic, parameter-efficient adapter that reweights feature responses rather than re-tuning the underlying backbone. Existing low-rank adaptation methods are mostly confined to vision transformers (ViTs) and struggle to generalize across architectures. MSLoRA unifies adaptation for both convolutional neural networks (CNNs) and ViTs by combining a low-rank linear projection with a multi-scale nonlinear transformation that jointly modulates spatial and channel attention. The two components are fused through pointwise multiplication and a residual connection, yielding a lightweight module that shifts feature attention while keeping pretrained weights frozen. Extensive experiments demonstrate that MSLoRA consistently improves transfer performance on classification, detection, and segmentation tasks with roughly less than 5\% of backbone parameters. The design further enables stable optimization, fast convergence, and strong cross-architecture generalization. By reweighting rather than re-tuning, MSLoRA provides a simple and universal approach for efficient adaptation of frozen vision backbones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes