CVLGJan 23, 2025

Regularizing cross entropy loss via minimum entropy and K-L divergence

arXiv:2501.13709v1h-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses classification accuracy for deep learning practitioners, offering incremental improvements over existing loss functions.

The paper tackles classification in deep learning by introducing two novel loss functions, MIX-ENT and MIN-ENT, which regularize cross-entropy with minimum entropy and K-L divergence terms, resulting in improved accuracy on the EMNIST-Letters dataset, with MIN-ENT achieving 95.933% and MIX-ENT 95.927%, outperforming previous methods.

I introduce two novel loss functions for classification in deep learning. The two loss functions extend standard cross entropy loss by regularizing it with minimum entropy and Kullback-Leibler (K-L) divergence terms. The first of the two novel loss functions is termed mixed entropy loss (MIX-ENT for short), while the second one is termed minimum entropy regularized cross-entropy loss (MIN-ENT for short). The MIX-ENT function introduces a regularizer that can be shown to be equivalent to the sum of a minimum entropy term and a K-L divergence term. However, it should be noted that the K-L divergence term here is different from that in the standard cross-entropy loss function, in the sense that it swaps the roles of the target probability and the hypothesis probability. The MIN-ENT function simply adds a minimum entropy regularizer to the standard cross entropy loss function. In both MIX-ENT and MIN-ENT, the minimum entropy regularizer minimizes the entropy of the hypothesis probability distribution which is output by the neural network. Experiments on the EMNIST-Letters dataset shows that my implementation of MIX-ENT and MIN-ENT lets the VGG model climb from its previous 3rd position on the paperswithcode leaderboard to reach the 2nd position on the leaderboard, outperforming the Spinal-VGG model in so doing. Specifically, using standard cross-entropy, VGG achieves 95.86% while Spinal-VGG achieves 95.88% classification accuracies, whereas using VGG (without Spinal-VGG) our MIN-ENT achieved 95.933%, while our MIX-ENT achieved 95.927% accuracies. The pre-trained models for both MIX-ENT and MIN-ENT are at https://github.com/rahmanoladi/minimum entropy project.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes