CVLGMay 23, 2023

Decoupled Kullback-Leibler Divergence Loss

arXiv:2305.13948v3102 citationsHas Code
Originality Incremental advance
AI Analysis

This work provides an improved loss function for machine learning practitioners in adversarial training and knowledge distillation, though it is incremental as it builds on existing KL divergence formulations.

The paper tackles the limitations of Kullback-Leibler divergence loss by mathematically proving its equivalence to a decoupled form and introducing improvements to address asymmetry and sample bias, resulting in new state-of-the-art adversarial robustness on RobustBench and competitive performance in knowledge distillation tasks.

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the $\mathbf{w}$MSE component is always effective during training, providing extra constructive cues. Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training, and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes