AILGOct 6, 2020

CURI: A Benchmark for Productive Concept Learning Under Uncertainty

arXiv:2010.02855v132 citations
Originality Incremental advance
AI Analysis

This addresses the problem of limited benchmarks for systematic generalization in AI, particularly for researchers in few-shot learning and meta-learning, though it is incremental as it builds on existing evaluation frameworks.

The paper tackles the gap in evaluating compositional concept learning under uncertainty by introducing the CURI benchmark, which assesses productive generalization across modalities and defines a model-independent compositionality gap, revealing significant room for improvement in current models.

Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head"). In contrast, standard classification benchmarks: 1) consider only a fixed set of category labels, 2) do not evaluate compositional concept learning and 3) do not explicitly capture a notion of reasoning under uncertainty. We introduce a new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI) to bridge this gap. CURI evaluates different aspects of productive and systematic generalization, including abstract understandings of disentangling, productive generalization, learning boolean operations, variable binding, etc. Importantly, it also defines a model-independent "compositionality gap" to evaluate the difficulty of generalizing out-of-distribution along each of these axes. Extensive evaluations across a range of modeling choices spanning different modalities (image, schemas, and sounds), splits, privileged auxiliary concept information, and choices of negatives reveal substantial scope for modeling advances on the proposed task. All code and datasets will be available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes