Nonparametric Deconvolution Models
This work addresses the challenge of analyzing aggregated data with hidden heterogeneity, such as in elections, offering a method to deconvolve observations into constituent factors, though it appears incremental as an extension of hierarchical Dirichlet processes.
The paper tackles the problem of modeling data where each observation is an average over heterogeneous particles, such as precinct-level vote tallies, by introducing nonparametric deconvolution models (NDMs) that recover how factor distributions vary locally for each observation. The result shows that including local factors improves estimates of global factors and provides a novel scaffold for exploring data, as demonstrated on simulated and California voting data.
We describe nonparametric deconvolution models (NDMs), a family of Bayesian nonparametric models for collections of data in which each observation is the average over the features from heterogeneous particles. For example, these types of data are found in elections, where we observe precinct-level vote tallies (observations) of individual citizens' votes (particles) across each of the candidates or ballot measures (features), where each voter is part of a specific voter cohort or demographic (factor). Like the hierarchical Dirichlet process, NDMs rely on two tiers of Dirichlet processes to explain the data with an unknown number of latent factors; each observation is modeled as a weighted average of these latent factors. Unlike existing models, NDMs recover how factor distributions vary locally for each observation. This uniquely allows NDMs both to deconvolve each observation into its constituent factors, and also to describe how the factor distributions specific to each observation vary across observations and deviate from the corresponding global factors. We present variational inference techniques for this family of models and study its performance on simulated data and voting data from California. We show that including local factors improves estimates of global factors and provides a novel scaffold for exploring data.