Soft Learning Probabilistic Circuits
This work addresses a domain-specific problem for researchers and practitioners using PCs, offering an incremental improvement over the existing LearnSPN algorithm.
The paper tackled the problem of training Probabilistic Circuits (PCs) by proposing SoftLearn, a soft clustering-based learning procedure that improves upon the hard clustering method of LearnSPN. The result shows that SoftLearn outperforms LearnSPN in many situations, yielding better likelihoods and arguably better samples.
Probabilistic Circuits (PCs) are prominent tractable probabilistic models, allowing for a range of exact inferences. This paper focuses on the main algorithm for training PCs, LearnSPN, a gold standard due to its efficiency, performance, and ease of use, in particular for tabular data. We show that LearnSPN is a greedy likelihood maximizer under mild assumptions. While inferences in PCs may use the entire circuit structure for processing queries, LearnSPN applies a hard method for learning them, propagating at each sum node a data point through one and only one of the children/edges as in a hard clustering process. We propose a new learning procedure named SoftLearn, that induces a PC using a soft clustering process. We investigate the effect of this learning-inference compatibility in PCs. Our experiments show that SoftLearn outperforms LearnSPN in many situations, yielding better likelihoods and arguably better samples. We also analyze comparable tractable models to highlight the differences between soft/hard learning and model querying.