LGAICVNov 26, 2023

TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration

arXiv:2311.15335v22 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses efficiency for robotics in real-world scenarios, though it is incremental as it builds on existing transformer-based AVE methods.

The paper tackled the high computational cost in Active Visual Exploration (AVE) by introducing TORE, a method that reuses tokens and reduces decoder blocks, achieving up to 90% reduction in computational overhead while outperforming state-of-the-art methods.

Active Visual Exploration (AVE) optimizes the utilization of robotic resources in real-world scenarios by sequentially selecting the most informative observations. However, modern methods require a high computational budget due to processing the same observations multiple times through the autoencoder transformers. As a remedy, we introduce a novel approach to AVE called TOken REcycling (TORE). It divides the encoder into extractor and aggregator components. The extractor processes each observation separately, enabling the reuse of tokens passed to the aggregator. Moreover, to further reduce the computations, we decrease the decoder to only one block. Through extensive experiments, we demonstrate that TORE outperforms state-of-the-art methods while reducing computational overhead by up to 90\%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes