MLLGJul 31, 2025

Closed-Form Beta Distribution Estimation from Sparse Statistics with Random Forest Implicit Regularization

arXiv:2507.23767v2
Originality Incremental advance
AI Analysis

This work addresses the problem of estimating distributions from limited data for applications like secondary ticket marketplaces, though it appears incremental as it builds on existing methods for sparse statistics and ensemble classification.

The paper tackles distribution recovery from sparse statistics by introducing a closed-form estimator for scaled beta distributions using composite quantile and moment matching, which improves Random Forest classification accuracy on time-series snapshots, with error bounds linking accuracy to distributional closeness.

This work advances distribution recovery from sparse data and ensemble classification through three main contributions. First, we introduce a closed-form estimator that reconstructs scaled beta distributions from limited statistics (minimum, maximum, mean, and median) via composite quantile and moment matching. The recovered parameters $(α,β)$, when used as features in Random Forest classifiers, improve pairwise classification on time-series snapshots, validating the fidelity of the recovered distributions. Second, we establish a link between classification accuracy and distributional closeness by deriving error bounds that constrain total variation distance and Jensen-Shannon divergence, the latter exhibiting quadratic convergence. Third, we show that zero-variance features act as an implicit regularizer, increasing selection probability for mid-ranked predictors and producing deeper, more varied trees. A SeatGeek pricing dataset serves as the primary application, illustrating distributional recovery and event-level classification while situating these methods within the structure and dynamics of the secondary ticket marketplace. The UCI handwritten digits dataset confirms the broader regularization effect. Overall, the study outlines a practical route from sparse distributional snapshots to closed-form estimation and improved ensemble accuracy, with reliability enhanced through implicit regularization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes