R-SQAIR: Relational Sequential Attend, Infer, Repeat
This work addresses the challenge of relational inference in unsupervised video learning, offering an incremental improvement over existing attention models.
The authors tackled the problem of modeling object relations in sequential multi-object attention models by proposing R-SQAIR, a relational extension of SQAIR that uses parallel pairwise interactions, resulting in gains over sequential mechanisms and improved combinatorial generalization.
Traditional sequential multi-object attention models rely on a recurrent mechanism to infer object relations. We propose a relational extension (R-SQAIR) of one such attention model (SQAIR) by endowing it with a module with strong relational inductive bias that computes in parallel pairwise interactions between inferred objects. Two recently proposed relational modules are studied on tasks of unsupervised learning from videos. We demonstrate gains over sequential relational mechanisms, also in terms of combinatorial generalization.