CVAILGFeb 1

Understanding vision transformer robustness through the lens of out-of-distribution detection

arXiv:2602.01459v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of maintaining transformer robustness under quantization for real-time vision applications, but it is incremental as it builds on existing quantization and OOD detection methods.

The paper investigates how quantization affects vision transformers' robustness in out-of-distribution detection, finding that pretraining on large datasets like ImageNet-22k leads to significant performance drops (e.g., 15.0-19.2% delta in AUPR-out) compared to smaller datasets, suggesting data augmentation as a better alternative.

Vision transformers have shown remarkable performance in vision tasks, but enabling them for accessible and real-time use is still challenging. Quantization reduces memory and inference costs at the risk of performance loss. Strides have been made to mitigate low precision issues mainly by understanding in-distribution (ID) task behaviour, but the attention mechanism may provide insight on quantization attributes by exploring out-of-distribution (OOD) situations. We investigate the behaviour of quantized small-variant popular vision transformers (DeiT, DeiT3, and ViT) on common OOD datasets. ID analyses show the initial instabilities of 4-bit models, particularly of those trained on the larger ImageNet-22k, as the strongest FP32 model, DeiT3, sharply drop 17% from quantization error to be one of the weakest 4-bit models. While ViT shows reasonable quantization robustness for ID calibration, OOD detection reveals more: ViT and DeiT3 pretrained on ImageNet-22k respectively experienced a 15.0% and 19.2% average quantization delta in AUPR-out between full precision to 4-bit while their ImageNet-1k-only counterparts experienced a 9.5% and 12.0% delta. Overall, our results suggest pretraining on large scale datasets may hinder low-bit quantization robustness in OOD detection and that data augmentation may be a more beneficial option.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes