LGAICVNov 12, 2025

Stratified Knowledge-Density Super-Network for Scalable Vision Transformers

arXiv:2511.11683v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable deployment of vision transformers for varying computational resources, presenting an incremental improvement in model compression and expansion techniques.

The paper tackles the inefficiency of training multiple vision transformer models for different resource constraints by proposing a stratified knowledge-density super-network that enables flexible extraction of sub-networks with maximal knowledge retention. The method, combining Weighted PCA for Attention Contraction and Progressive Importance-Aware Dropout, outperforms existing pruning criteria and offers a strong alternative to state-of-the-art compression and expansion methods.

Training and deploying multiple vision transformer (ViT) models for different resource constraints is costly and inefficient. To address this, we propose transforming a pre-trained ViT into a stratified knowledge-density super-network, where knowledge is hierarchically organized across weights. This enables flexible extraction of sub-networks that retain maximal knowledge for varying model sizes. We introduce \textbf{W}eighted \textbf{P}CA for \textbf{A}ttention \textbf{C}ontraction (WPAC), which concentrates knowledge into a compact set of critical weights. WPAC applies token-wise weighted principal component analysis to intermediate features and injects the resulting transformation and inverse matrices into adjacent layers, preserving the original network function while enhancing knowledge compactness. To further promote stratified knowledge organization, we propose \textbf{P}rogressive \textbf{I}mportance-\textbf{A}ware \textbf{D}ropout (PIAD). PIAD progressively evaluates the importance of weight groups, updates an importance-aware dropout list, and trains the super-network under this dropout regime to promote knowledge stratification. Experiments demonstrate that WPAC outperforms existing pruning criteria in knowledge concentration, and the combination with PIAD offers a strong alternative to state-of-the-art model compression and model expansion methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes