CVApr 19

Attention Is not Everything: Efficient Alternatives for Vision

arXiv:2604.174394.8
AI Analysis

For computer vision researchers, this provides a structured overview of alternatives to Transformers, highlighting their comparative strengths and weaknesses.

This review categorizes non-Transformer vision methods (convolution, MLP, state-space, etc.) from 40 papers, analyzing their efficiency, scalability, interpretability, and robustness to identify challenges and opportunities for future research.

Recently computer vision has seen advancements mainly thanks to Transformer-based models. However many non-Transformer methods are still doing well being a direct competition of Transformer-based models. This review tries to present a comprehensive taxonomy of such methods and organize these methods into categories like convolution-based models, MLP-based models, state-space-based and more. These methods are looked at in terms of how efficient they are, how well they scale, how easy they are to understand and how robust they are. A total of 40 papers were chosen for this study. The goal is to give a view of non-Transformer methods and find out what challenges and opportunities exist for future computer vision research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes