CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning

arXiv:2605.2570821.8

AI Analysis

For multi-domain task-incremental learning, CMAP provides a parameter-efficient method that improves performance without external data, addressing the problem of forgetting and task identity inference.

CMAP introduces cross-modal adaptive prompting for multi-domain task-incremental learning, leveraging CLIP's text embedding space for task routing, confidence estimation, and encoder adaptation. It achieves 74.2% Transfer, 80.5% Average, and 88.7% Last on the MTIL benchmark, surpassing prior SOTA by 5.0, 3.7, and 3.0 percentage points with only 2.5M parameters.

Multi-domain task-incremental learning requires a model to sequentially acquire knowledge across visually diverse domains without forgetting prior tasks, and without access to task identity at inference. Parameter-efficient methods built on frozen vision-language models have made strong progress, yet all existing approaches rely exclusively on visual features for task routing, confidence estimation, and encoder adaptation, leaving CLIP's cross-modal text embedding space entirely unexploited. We address this gap through three contributions. Text-space task routing replaces visual Gaussian matching with cosine similarity to frozen CLIP text prototypes, giving order-independent routing robust to data scarcity at zero parameter cost. Multi-prototype visual-textual confidence replaces single-Gaussian class modeling with K-means visual prototypes and cross-modal alignment scores under task-calibrated thresholds. Symmetric cross-modal gating extends per-layer Gumbel gates to the text encoder conditioned on batch image features, preserving cross-modal alignment on out-of-distribution inputs. On the MTIL benchmark spanning 11 datasets and 1201 classes, our method achieves 74.2% Transfer, 80.5% Average, and 88.7% Last under Order-I, surpassing the prior state of the art by 5.0, 3.7, and 3.0 percentage points with only 2.5M trainable parameters and no external data.

View on arXiv PDF

Similar