LG AINov 20, 2025

ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation

Carlos Boned Riera, David Romero Sanchez, Oriol Ramos Terrades

arXiv:2511.16501v14.1h-index: 17

Originality Incremental advance

AI Analysis

This addresses efficiency and interpretability issues in computer vision models for researchers and practitioners, though it is incremental as it builds on existing ODE-Transformer connections.

The paper tackles the high computational and storage demands of large Vision Transformers by introducing ODE-ViT, a reformulation as an ordinary differential equation system, achieving stable and competitive performance with up to one order of magnitude fewer parameters on CIFAR datasets.

In recent years, increasingly large models have achieved outstanding performance across CV tasks. However, these models demand substantial computational resources and storage, and their growing complexity limits our understanding of how they make decisions. Most of these architectures rely on the attention mechanism within Transformer-based designs. Building upon the connection between residual neural networks and ordinary differential equations (ODEs), we introduce ODE-ViT, a Vision Transformer reformulated as an ODE system that satisfies the conditions for well-posed and stable dynamics. Experiments on CIFAR-10 and CIFAR-100 demonstrate that ODE-ViT achieves stable, interpretable, and competitive performance with up to one order of magnitude fewer parameters, surpassing prior ODE-based Transformer approaches in classification tasks. We further propose a plug-and-play teacher-student framework in which a discrete ViT guides the continuous trajectory of ODE-ViT by treating the intermediate representations of the teacher as solutions of the ODE. This strategy improves performance by more than 10% compared to training a free ODE-ViT from scratch.

View on arXiv PDF

Similar