CV AI SPMay 12

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Alexandra G. Roberts, Maneesh John, Jinwei Zhang, Dominick Romano, Mert Sisman, Ki Sueng Choi, Heejong Kim, Mert R. Sabuncu, Thanh D. Nguyen, Alexey V. Dimov, Pascal Spincemaille, Brian H. Kopell

arXiv:2605.1202677.6Has Code

Predicted impact top 39% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of applying vision transformers to medical imaging with limited data by proposing a more efficient tokenization method.

The paper introduces a spectral vision transformer that achieves efficient tokenization with limited data, demonstrating superior or comparable performance to existing models with fewer parameters across medical imaging tasks.

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial invariance and optimal signal-to-noise ratio. We show reduced complexity arising from the spectral projection compared to spatial vision transformers. We show equitable or superior performance with a reduced number of parameters as compared to a variety of models including compact and standard vision transformers, convolutional neural networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression. We include simulated, public, and clinical data in our analysis and release our code at: \verb+github.com/agr78/spectralViT+.

View on arXiv PDF Code

Similar