Estimating Density Models with Truncation Boundaries using Score Matching
This addresses a specific problem in statistical modeling for researchers and practitioners dealing with truncated data, offering a novel method for an existing bottleneck.
The paper tackles parameter estimation for truncated density models, where traditional methods like Maximum Likelihood Estimation are infeasible due to intractable normalizing constants, by proposing a weighted Score Matching estimator that minimizes a Fisher divergence with a weight function based on distance to the domain boundary, demonstrating its utility through numerical experiments and a Chicago crime dataset application, including correcting outlier-trimming bias.
Truncated densities are probability density functions defined on truncated domains. They share the same parametric form with their non-truncated counterparts up to a normalizing constant. Since the computation of their normalizing constants is usually infeasible, Maximum Likelihood Estimation cannot be easily applied to estimate truncated density models. Score Matching (SM) is a powerful tool for fitting parameters using only unnormalized models. However, it cannot be directly applied here as boundary conditions used to derive a tractable SM objective are not satisfied by truncated densities. In this paper, we study parameter estimation for truncated probability densities using SM. The estimator minimizes a weighted Fisher divergence. The weight function is simply the shortest distance from a data point to the boundary of the domain. We show this choice of weight function naturally arises from minimizing the Stein discrepancy as well as upperbounding the finite-sample estimation error. The usefulness of our method is demonstrated by numerical experiments and a study on the Chicago crime data set. We also show that the proposed density estimation can correct the outlier-trimming bias caused by aggressive outlier detection methods.