Infinitely imbalanced binomial regression and deformed exponential families
This work addresses theoretical convergence properties in statistical modeling for imbalanced data, but it is incremental as it extends known results to a broader class of link functions.
The paper demonstrates that binomial regression models with various link functions converge to Poisson point process models under infinite imbalance, using extreme value theory, and shows that the intensity measures form exponential or deformed exponential families, proposing a penalized maximum likelihood estimator for the Poisson model.
The logistic regression model is known to converge to a Poisson point process model if the binary response tends to infinitely imbalanced. In this paper, it is shown that this phenomenon is universal in a wide class of link functions on binomial regression. The proof relies on the extreme value theory. For the logit, probit and complementary log-log link functions, the intensity measure of the point process becomes an exponential family. For some other link functions, deformed exponential families appear. A penalized maximum likelihood estimator for the Poisson point process model is suggested.