CVMar 30, 2017

Dynamic Computational Time for Visual Attention

arXiv:1703.10332v3117 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses efficiency improvements for visual attention models in computer vision, though it is incremental as it builds on existing recurrent attention methods.

The paper tackles the problem of reducing average processing time in recurrent visual attention models by introducing a dynamic computational time mechanism that learns when to stop processing each image, achieving the same recognition performance as the baseline while saving computational time on fine-grained image recognition datasets like CUB-200-2011 and Stanford Cars.

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM). Rather than attention with a fixed number of steps for each input image, the model learns to decide when to stop on the fly. To achieve this, we add an additional continue/stop action per time step to RAM and use reinforcement learning to learn both the optimal attention policy and stopping policy. The modification is simple but could dramatically save the average computational time while keeping the same recognition performance as RAM. Experimental results on CUB-200-2011 and Stanford Cars dataset demonstrate the dynamic computational model can work effectively for fine-grained image recognition.The source code of this paper can be obtained from https://github.com/baidu-research/DT-RAM

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes