CVNov 24, 2025

INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models

Parsa Madinei, Ryan Solgi, Ziqi Wen, Jonathan Skaza, Miguel Eckstein, Ramtin Pedarsani

arXiv:2511.19676v16.21 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently compressing large vision-language models for deployment, though it is incremental as it builds on existing layer pruning methods.

The paper tackles the problem of performance drop in large vision-language models after layer pruning by introducing INTERLACE, a framework that prunes redundant layers and uses sample-efficient finetuning to maintain performance, achieving 88.9% average performance retention after dropping 25% of the network with minimal data.

We introduce INTERLACE, a novel framework that prunes redundant layers in VLMs while maintaining performance through sample-efficient finetuning. Existing layer pruning methods lead to significant performance drop when applied to VLMs. Instead, we analyze triplets of consecutive layers to identify local redundancy, removing the most redundant of the first two layers, finetune the remaining layer to compensate for the lost capacity, and freeze the third layer to serve as a stable anchor during finetuning. We found that this interleaved finetune-freeze design enables rapid convergence with minimal data after pruning. By finetuning only a subset of layers on just 1% of the FineVision dataset for one epoch, Interlace achieves 88.9% average performance retention after dropping 25% of the network, achieving SOTA performance. Our code is available at: https://github.com/pmadinei/Interlace.git

View on arXiv PDF Code

Similar