CLAINov 2, 2023

MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

arXiv:2311.01580v1132 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of grounded compositional concept recognition in AI, offering a method to improve generalization in vision-language tasks, though it is incremental in combining retrieval with meta-learning.

The paper tackles the problem of learning novel compositional concepts from visual and language data by proposing MetaReVision, a retrieval-enhanced meta-learning model, which outperforms competitive baselines on newly created benchmarks CompCOCO and CompFlickr.

Humans have the ability to learn novel compositional concepts by recalling and generalizing primitive concepts acquired from past experiences. Inspired by this observation, in this paper, we propose MetaReVision, a retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem. The proposed MetaReVision consists of a retrieval module and a meta-learning module which are designed to incorporate retrieved primitive concepts as a supporting set to meta-train vision-anguage models for grounded compositional concept recognition. Through meta-learning from episodes constructed by the retriever, MetaReVision learns a generic compositional representation that can be fast updated to recognize novel compositional concepts. We create CompCOCO and CompFlickr to benchmark the grounded compositional concept learning. Our experimental results show that MetaReVision outperforms other competitive baselines and the retrieval module plays an important role in this compositional learning process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes