Alpha-divergence loss function for neural density ratio estimation
This work addresses incremental improvements for researchers in density ratio estimation, focusing on optimization stability rather than broad accuracy gains.
The paper tackled optimization challenges in neural density ratio estimation by proposing a novel α-divergence loss function (α-Div), which offers stable optimization but shows no significant accuracy advantage over KL-divergence loss in terms of RMSE.
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergences. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback--Leibler (KL) divergence loss functions. To address these issues, we focus on $α$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $α$-divergence loss function ($α$-Div), is derived. $α$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $α$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness of $α$-Div in optimization. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $α$-divergence.