CVAIOct 14, 2024

LOBG:Less Overfitting for Better Generalization in Vision-Language Model

arXiv:2410.10247v21 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses overfitting in vision-language models for better transfer to downstream tasks, but appears incremental as it builds on existing prompt learning frameworks.

The paper tackles overfitting in vision-language model prompt learning, proposing LOBG to filter fine-grained information and add losses, resulting in significantly improved generalization compared to state-of-the-art methods.

Existing prompt learning methods in Vision-Language Models (VLM) have effectively enhanced the transfer capability of VLM to downstream tasks, but they suffer from a significant decline in generalization due to severe overfitting. To address this issue, we propose a framework named LOBG for vision-language models. Specifically, we use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts. To further mitigate overfitting, we devel oped a structural topology preservation (STP) loss at the feature level, which endows the feature space with overall plasticity, allowing effective reshaping of the feature space during optimization. Additionally, we employed hierarchical logit distilation (HLD) at the output level to constrain outputs, complementing STP at the output end. Extensive experimental results demonstrate that our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes