CVAIMay 28, 2019

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

arXiv:1905.12042v16 citations
Originality Incremental advance
AI Analysis

This work addresses a challenging problem in AI for scene understanding and reasoning, with potential applications in robotics and cognitive systems, but it is incremental as it builds on existing perception methods and focuses on a specific domain.

The paper tackles the task of predicting sequences of actions to rearrange objects between two images, introducing the Image-based Event-Sequencing (IES) problem and the Blocksworld Image Reasoning Dataset (BIRD). They propose a modular two-step approach combining visual perception and event-sequencing, showing improved performance over end-to-end deep learning methods, though specific numerical gains are not provided.

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We then propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes