LG NAJul 10, 2025

Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers

Nikolay Yudin, Alexander Gaponov, Sergei Kudriashov, Maxim Rakhuba

arXiv:2507.07814v119.78 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses robustness issues in transformers for machine learning practitioners, offering an incremental improvement with a new regularization method.

The paper tackles the problem of improving transformer robustness by deriving a new local Lipschitz bound for self-attention blocks, based on a refined spectral norm of softmax, which reveals dependence on attention score maps and leads to a 10-15% increase in adversarial robustness with the proposed JaSMin regularization.

We present a novel local Lipschitz bound for self-attention blocks of transformers. This bound is based on a refined closed-form expression for the spectral norm of the softmax function. The resulting bound is not only more accurate than in the prior art, but also unveils the dependence of the Lipschitz constant on attention score maps. Based on the new findings, we suggest an explanation of the way distributions inside the attention map affect the robustness from the Lipschitz constant perspective. We also introduce a new lightweight regularization term called JaSMin (Jacobian Softmax norm Minimization), which boosts the transformer's robustness and decreases local Lipschitz constants of the whole network.

View on arXiv PDF

Similar