CVApr 28, 2018

CRAM: Clued Recurrent Attention Model

arXiv:1804.10844v12 citations
Originality Incremental advance
AI Analysis

This work addresses computer vision problems by enhancing recurrent attention models with clues, but it is incremental as it builds on prior attention-based methods.

The paper tackles the scalability issue of convolutional neural networks by proposing CRAM, a clued recurrent attention model that uses clues to guide attention, achieving better performance in image classification and inpainting tasks compared to existing methods.

To overcome the poor scalability of convolutional neural network, recurrent attention model(RAM) selectively choose what and where to look on the image. By directing recurrent attention model how to look the image, RAM can be even more successful in that the given clue narrow down the scope of the possible focus zone. In this perspective, this work proposes clued recurrent attention model (CRAM) which add clue or constraint on the RAM better problem solving. CRAM follows encoder-decoder framework, encoder utilizes recurrent attention model with spatial transformer network and decoder which varies depending on the task. To ensure the performance, CRAM tackles two computer vision task. One is the image classification task, with clue given as the binary image saliency which indicates the approximate location of object. The other is the inpainting task, with clue given as binary mask which indicates the occluded part. In both tasks, CRAM shows better performance than existing methods showing the successful extension of RAM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes