Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
This work provides theoretical insights into dropout regularization for statisticians and machine learning practitioners, though it is incremental as it extends prior results to more complex models.
The paper tackles the theoretical understanding of dropout regularization by analyzing it in extended generalized linear models with double exponential families, where dispersion varies with features, and shows that dropout prefers rare but important features in both mean and dispersion parameters. In experiments on adaptive smoothing with B-splines and traffic detection data, dropout outperforms penalized maximum likelihood with smoothness penalties.
Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modeled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty. An application to traffic detection data from Berlin further illustrates the benefits of our method.