CVAICLLGJul 6, 2023

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

arXiv:2307.03135v346 citationsh-index: 75Has Code
Originality Incremental advance
AI Analysis

It addresses the deployment challenge of large vision-language models on resource-constrained devices by improving out-of-distribution generalization, though it is incremental as it builds on existing distillation techniques.

This paper tackles the problem of distilling large vision-language models into smaller, faster versions while maintaining performance, specifically focusing on improving out-of-distribution generalization, which has been overlooked in prior work. The proposed methods, based on enhancing visual representation imitation and enriching language semantics, achieve significant improvements in zero-shot and few-shot classification on open-vocabulary out-of-distribution tasks.

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Poster: https://xuanlinli17.github.io/pdfs/iccv23_large_vlm_distillation_poster.pdf Code: https://github.com/xuanlinli17/large_vlm_distillation_ood

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes