HCAILGMar 10, 2021

Fast and flexible: Human program induction in abstract reasoning tasks

arXiv:2103.05823v143 citations
Originality Synthesis-oriented
AI Analysis

This provides baseline human data for a challenging AI benchmark, which is incremental as it reports initial behavioral results without proposing new methods.

The study tackled the problem of human performance on the Abstraction and Reasoning Corpus (ARC) program induction tasks, finding that humans solved an average of 80% of tasks per participant, with 65% of tasks solved by over 80% of participants.

The Abstraction and Reasoning Corpus (ARC) is a challenging program induction dataset that was recently proposed by Chollet (2019). Here, we report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000). Although this subset of tasks contains considerable variation, our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 80% of tasks solved per participant, and with 65% of tasks being solved by more than 80% of participants. Additionally, we find interesting patterns of behavioral consistency and variability within the action sequences during the generation process, the natural language descriptions to describe the transformations for each task, and the errors people made. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution. Future modeling work could incorporate these findings, potentially by connecting the natural language descriptions we collected here to the underlying semantics of ARC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes