AI CV LGNov 14, 2023

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

Luca H. Thoms, Karel A. Veldkamp, Hannes Rosenbusch, Claire E. Stevenson

arXiv:2311.08083v110.98 citationsh-index: 9Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of generalizing visual analogy solving for AI systems, though it is incremental as it adapts an existing verbal method to the visual domain with limited performance gains.

The paper tackled visual analogical reasoning by applying vector arithmetic from word embeddings to visual data using a variational autoencoder on the Abstraction and Reasoning Corpus (ARC), achieving scores of 2% on ARC and 8.8% on ConceptARC.

Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king - man + woman = __?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21%) and on ConceptARC it scored 8.8%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.

View on arXiv PDF Code

Similar