LGFeb 24, 2023

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

arXiv:2302.12480v116.516 citationsh-index: 81Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently encoding and transferring robustness in machine learning models, offering a lightweight and flexible solution for practitioners, though it is incremental as it builds on existing robustness training methods.

The paper tackles the problem of understanding and transferring robustness in neural networks by proposing that robustness to distribution shifts is encoded linearly in weight differences, called Robust Weight Signatures (RWS), which can be added to clean models to patch in robustness with minimal storage and effort. The result shows that RWSs are up to 13x more compact than full weight copies and allow adjustable, composable, and transferable robustness patching.

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start by drawing several key observations: (1)assuming that we train the same model architecture on both a clean dataset and its corrupted version, resultant weights mostly differ in shallow layers; (2)the weight difference after projection, which we call "Robust Weight Signature" (RWS), appears to be discriminative and indicative of different corruption types; (3)for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. We propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs. In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight. We verify our proposed framework to be remarkably (1)lightweight. since RWSs concentrate on the shallowest few layers and we further show they can be painlessly quantized, storing an RWS is up to 13 x more compact than storing the full weight copy; (2)in-situ adjustable. RWSs can be appended as needed and later taken off to restore the intact clean model. We further demonstrate one can linearly re-scale the RWS to control the patched robustness strength; (3)composable. Multiple RWSs can be added simultaneously to patch more comprehensive robustness at once; and (4)transferable. Even when the clean model backbone is continually adapted or updated, RWSs remain as effective patches due to their outstanding cross-dataset transferability.

View on arXiv PDF Code

Similar