CODED-SMOOTHING: Coding Theory Helps Generalization
This addresses the problem of model overfitting and vulnerability to attacks for ML practitioners, though it appears incremental as it adapts an existing paradigm from distributed computing.
The paper tackles the problem of improving generalization and adversarial robustness in machine learning models by introducing a coded-smoothing module that regularizes learning through linear combinations of data, achieving state-of-the-art robustness against gradient-based adversarial attacks.
We introduce the coded-smoothing module, which can be seamlessly integrated into standard training pipelines, both supervised and unsupervised, to regularize learning and improve generalization with minimal computational overhead. In addition, it can be incorporated into the inference pipeline to randomize the model and enhance robustness against adversarial perturbations. The design of coded-smoothing is inspired by general coded computing, a paradigm originally developed to mitigate straggler and adversarial failures in distributed computing by processing linear combinations of the data rather than the raw inputs. Building on this principle, we adapt coded computing to machine learning by designing an efficient and effective regularization mechanism that encourages smoother representations and more generalizable solutions. Extensive experiments on both supervised and unsupervised tasks demonstrate that coded-smoothing consistently improves generalization and achieves state-of-the-art robustness against gradient-based adversarial attacks.