CVApr 1, 2021

Visual Attention in Imaginative Agents

arXiv:2104.00177v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses visual attention for imaginative agents, but it appears incremental as it builds on existing methods like variational autoencoders and normalizing flows.

The paper tackles the problem of enabling agents to perceive surroundings through discrete fixations by imagining plausible scenes and planning fixations based on uncertainty, resulting in latent representations useful for pixel-level and scene-level tasks.

We present a recurrent agent who perceives surroundings through a series of discrete fixations. At each timestep, the agent imagines a variety of plausible scenes consistent with the fixation history. The next fixation is planned using uncertainty in the content of the imagined scenes. As time progresses, the agent becomes more certain about the content of the surrounding, and the variety in the imagined scenes reduces. The agent is built using a variational autoencoder and normalizing flows, and trained in an unsupervised manner on a proxy task of scene-reconstruction. The latent representations of the imagined scenes are found to be useful for performing pixel-level and scene-level tasks by higher-order modules. The agent is tested on various 2D and 3D datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes