Pseudo-Spherical Contrastive Divergence
This work addresses a fundamental bottleneck in training EBMs for machine learning practitioners, offering a flexible and computationally efficient alternative, though it appears incremental as it builds upon contrastive divergence.
The paper tackles the challenge of training energy-based models (EBMs) by proposing pseudo-spherical contrastive divergence (PS-CD), a method that generalizes maximum likelihood learning without computing the intractable partition function, and demonstrates its effectiveness through experiments on synthetic and image datasets, showing robustness to data contamination and superiority over existing methods like maximum likelihood and f-EBMs.
Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition function and provides a generalized family of learning objectives that include contrastive divergence as a special case. Moreover, PS-CD allows us to flexibly choose various learning objectives to train EBMs without additional computational cost or variational minimax optimization. Theoretical analysis on the proposed method and extensive experiments on both synthetic data and commonly used image datasets demonstrate the effectiveness and modeling flexibility of PS-CD, as well as its robustness to data contamination, thus showing its superiority over maximum likelihood and $f$-EBMs.