CVDec 30, 2021

Improving the Behaviour of Vision Transformers with Token-consistent Stochastic Layers

arXiv:2112.15111v3
Originality Incremental advance
AI Analysis

This work addresses robustness and privacy issues in vision transformers, though it appears incremental as it modifies existing layers without altering the core architecture.

The paper tackles the problem of improving vision transformers by introducing token-consistent stochastic layers, which enhance network calibration, robustness, and privacy without severe performance drops, as demonstrated by boosting established baselines in three applications.

We introduce token-consistent stochastic layers in vision transformers, without causing any severe drop in performance. The added stochasticity improves network calibration, robustness and strengthens privacy. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations preserve the topological structure, formed by the set of tokens passing through the shared multilayer perceptron. This operation encourages the learning of the recognition task to rely on the topological structures of the tokens, instead of their values, which in turn offers the desired robustness and privacy of the visual features. The effectiveness of the token-consistent stochasticity is demonstrated on three different applications, namely, network calibration, adversarial robustness, and feature privacy, by boosting the performance of the respective established baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes