CVMar 12

Noise-aware few-shot learning through bi-directional multi-view prompt alignment

arXiv:2603.11617v19.6
Predicted impact top 71% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses robust few-shot learning for vision-language models under noisy supervision, representing an incremental improvement over existing methods.

The paper tackled the problem of few-shot learning with noisy labels in vision-language models by proposing NA-MVP, a framework that uses bi-directional multi-view prompt alignment to separate clean from noisy signals, resulting in consistent outperformance over state-of-the-art baselines on synthetic and real-world benchmarks.

Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and degrade cross-modal alignment. Existing approaches struggle because they often lack the ability to model fine-grained semantic cues and to adaptively separate clean from noisy signals. To address these challenges, we propose NA-MVP, a framework for Noise-Aware few-shot learning through bi-directional Multi-View Prompt alignment. NA-MVP is built upon a key conceptual shift: robust prompt learning requires moving from global matching to region-aware alignment that explicitly distinguishes clean cues from noisy ones. To realize this, NA-MVP employs (1) multi-view prompts combined with unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence while suppressing unreliable regions; (2) a bi-directional prompt design that captures complementary clean-oriented and noise-aware cues, enabling the model to focus on stable semantics; and (3) an alignment-guided selective refinement strategy that uses optimal transport to correct only mislabeled samples while retaining reliable data. Experiments on synthetic and real-world noisy benchmarks demonstrate that NA-MVP consistently outperforms state-of-the-art baselines, confirming its effectiveness in enabling robust few-shot learning under noisy supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes