CVMar 19

Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

arXiv:2603.195032.1h-index: 1
Predicted impact top 99% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the issue of high computational resource demands for vision models, enabling deployment in resource-constrained environments, though it is incremental as it builds on existing recursive model concepts.

The paper tackled the problem of parameter-intensive vision models by introducing ViTRM, a parameter-efficient architecture that uses a tiny recursive block, achieving competitive performance on CIFAR-10 and CIFAR-100 with up to 6x and 84x fewer parameters than CNN and ViT models, respectively.

The success of deep learning in computer vision has been driven by models of increasing scale, from deep Convolutional Neural Networks (CNN) to large Vision Transformers (ViT). While effective, these architectures are parameter-intensive and demand significant computational resources, limiting deployment in resource-constrained environments. Inspired by Tiny Recursive Models (TRM), which show that small recursive networks can solve complex reasoning tasks through iterative state refinement, we introduce the \textbf{Vision Tiny Recursion Model (ViTRM)}: a parameter-efficient architecture that replaces the $L$-layer ViT encoder with a single tiny $k$-layer block ($k{=}3$) applied recursively $N$ times. Despite using up to $6 \times $ and $84 \times$ fewer parameters than CNN based models and ViT respectively, ViTRM maintains competitive performance on CIFAR-10 and CIFAR-100. This demonstrates that recursive computation is a viable, parameter-efficient alternative to architectural depth in vision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes