ML AP COFeb 4, 2018

Using Poisson Binomial GLMs to Reveal Voter Preferences

arXiv:1802.01053v12.72 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of inferring individual voter preferences from aggregate data, which is incremental as it builds on existing ecological inference methods with a new modeling approach.

The authors tackled the problem of ecological inference by modeling aggregate count data with a Poisson binomial distribution and linking individual-level probabilities to covariates using logistic regression and neural networks, achieving predictive accuracy validated on a holdout set and weak labels for over four million voters in Pennsylvania's 2016 election.

We present a new modeling technique for solving the problem of ecological inference, in which individual-level associations are inferred from labeled data available only at the aggregate level. We model aggregate count data as arising from the Poisson binomial, the distribution of the sum of independent but not identically distributed Bernoulli random variables. We relate individual-level probabilities to individual covariates using both a logistic regression and a neural network. A normal approximation is derived via the Lyapunov Central Limit Theorem, allowing us to efficiently fit these models on large datasets. We apply this technique to the problem of revealing voter preferences in the 2016 presidential election, fitting a model to a sample of over four million voters from the highly contested swing state of Pennsylvania. We validate the model at the precinct level via a holdout set, and at the individual level using weak labels, finding that the model is predictive and it learns intuitively reasonable associations.

View on arXiv PDF

Similar