CLAICVLGJun 18, 2022

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

UW
arXiv:2206.09059v289 citationsh-index: 27
AI Analysis

This addresses the problem of catastrophic forgetting in multimodal continual learning for researchers, though it is incremental as it extends existing CL benchmarks to multimodal settings.

The authors tackled the lack of benchmarks for continual learning in vision-and-language tasks by introducing CLiMB, a benchmark that evaluates how models learn multimodal tasks sequentially, finding that common methods mitigate forgetting but do not enable cross-task knowledge transfer.

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer. We envision that CLiMB will facilitate research on a new class of CL algorithms for this challenging multimodal setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes