LGNov 30, 2025

Estimating the Effective Rank of Vision Transformers via Low-Rank Factorization

arXiv:2512.00792v1

Originality Incremental advance

AI Analysis

This provides a practical tool for characterizing model compression and intrinsic dimensionality in deep learning, though it is incremental as it builds on existing low-rank factorization and distillation methods.

The paper tackles the problem of estimating the intrinsic dimensionality of deep networks, specifically Vision Transformers, by introducing a framework that defines effective rank as a region where factorized models achieve 85-95% of teacher accuracy, with results showing an effective rank region of [16, 34] and 94.7% baseline accuracy at rank 32 on ViT-B/32 fine-tuned on CIFAR-100.

Deep networks are heavily over-parameterized, yet their learned representations often admit low-rank structure. We introduce a framework for estimating a model's intrinsic dimensionality by treating learned representations as projections onto a low-rank subspace of the model's full capacity. Our approach: train a full-rank teacher, factorize its weights at multiple ranks, and train each factorized student via distillation to measure performance as a function of rank. We define effective rank as a region, not a point: the smallest contiguous set of ranks for which the student reaches 85-95% of teacher accuracy. To stabilize estimates, we fit accuracy vs. rank with a monotone PCHIP interpolant and identify crossings of the normalized curve. We also define the effective knee as the rank maximizing perpendicular distance between the smoothed accuracy curve and its endpoint secant; an intrinsic indicator of where marginal gains concentrate. On ViT-B/32 fine-tuned on CIFAR-100 (one seed, due to compute constraints), factorizing linear blocks and training with distillation yields an effective-rank region of approximately [16, 34] and an effective knee at r* ~ 31. At rank 32, the student attains 69.46% top-1 accuracy vs. 73.35% for the teacher (~94.7% of baseline) while achieving substantial parameter compression. We provide a framework to estimate effective-rank regions and knees across architectures and datasets, offering a practical tool for characterizing the intrinsic dimensionality of deep models.

View on arXiv PDF

Similar