$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets
This work addresses scalable statistical modeling for binary responses with outlier sensitivity, but it is incremental as it builds on existing techniques like sketching and coresets.
The authors tackled the problem of estimating parameters in a generalized probit regression model with flexible tail behavior, and they developed an efficient method to approximate the maximum likelihood estimator with a (1+ε) factor on large datasets using sketching and coresets.
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a $p$-generalized normal distribution for $p\in[1, \infty)$. The $p$-generalized normal distributions \citep{Sub23} are of special interest in statistical modeling because they fit much more flexibly to data. Their tail behavior can be controlled by choice of the parameter $p$, which influences the model's sensitivity to outliers. Special cases include the Laplace, the Gaussian, and the uniform distributions. We further show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+\varepsilon)$ on large data by combining sketching techniques with importance subsampling to obtain a small data summary called coreset.