LG MLJul 13, 2020

S2RMs: Spatially Structured Recurrent Modules

Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schölkopf

arXiv:2007.06533v110.615 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of building dynamic models that exploit modular and spatiotemporal structures for applications like video prediction and multi-agent systems, representing an incremental advancement in leveraging inductive biases for improved generalization.

The paper tackled the problem of modeling dynamic systems with both modular and spatiotemporal structures by introducing S2RMs, which abstract systems as collections of autonomous, sparsely interacting sub-systems with learned topologies informed by spatial structure. The result showed improved robustness to the number of available views and better generalization to novel tasks without additional training in video prediction and multi-agent world modeling tasks, even when compared to strong baselines.

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. We accomplish this by abstracting the modeled dynamical system as a collection of autonomous but sparsely interacting sub-systems. The sub-systems interact according to a topology that is learned, but also informed by the spatial structure of the underlying real-world system. This results in a class of models that are well suited for modeling the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modeling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalization to novel tasks without additional training, even when compared against strong baselines that perform equally well or better on the training distribution.

View on arXiv PDF

Similar