Symbolic Density Estimation: A Decompositional Approach
This work addresses the need for interpretable density estimation models, particularly in domains like high-energy physics, but it is incremental as it extends symbolic regression to a new task.
The paper tackles the problem of symbolic density estimation, which was largely unexplored, by introducing AI-Kolmogorov, a multi-stage pipeline that decomposes the problem and applies symbolic regression to density estimates, demonstrating its efficacy on synthetic and exotic distributions, including those from high-energy physics, by discovering underlying distributions or providing mathematical insights.
We introduce AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE). Symbolic regression (SR) has been effectively used to produce interpretable models in standard regression settings but its applicability to density estimation tasks has largely been unexplored. To address the SymDE task we introduce a multi-stage pipeline: (i) problem decomposition through clustering and/or probabilistic graphical model structure learning; (ii) nonparametric density estimation; (iii) support estimation; and finally (iv) SR on the density estimate. We demonstrate the efficacy of AI-Kolmogorov on synthetic mixture models, multivariate normal distributions, and three exotic distributions, two of which are motivated by applications in high-energy physics. We show that AI-Kolmogorov can discover underlying distributions or otherwise provide valuable insight into the mathematical expressions describing them.