CVMar 10, 2024

A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets

arXiv:2403.06295v14 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and effective learning for fine-grained datasets in few-shot incremental settings, which is crucial for applications leveraging vision-language models, though it appears incremental in method.

The paper tackled the challenge of few-shot class-incremental learning on fine-grained datasets by proposing two modules, Session-Specific Prompts and Hyperbolic distance, resulting in an average 10-point performance increase over baselines with at least 8 times fewer trainable parameters.

Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams, all without overfitting. The rise of Vision-Language models (VLMs) has unlocked numerous applications, leveraging their existing knowledge to fine-tune on custom data. However, training the whole model is computationally prohibitive, and VLMs while being versatile in general domains still struggle with fine-grained datasets crucial for many applications. We tackle these challenges with two proposed simple modules. The first, Session-Specific Prompts (SSP), enhances the separability of image-text embeddings across sessions. The second, Hyperbolic distance, compresses representations of image-text pairs within the same class while expanding those from different classes, leading to better representations. Experimental results demonstrate an average 10-point increase compared to baselines while requiring at least 8 times fewer trainable parameters. This improvement is further underscored on our three newly introduced fine-grained datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes