R2VF: A Two-Step Regularization Algorithm to Cluster Categories in GLMs
This work addresses a specific computational bottleneck in GLMs for researchers and practitioners, representing an incremental improvement over existing regularization and clustering techniques.
The paper tackles the challenge of efficiently clustering nominal categories in Generalized Linear Models (GLMs) without high computational costs by introducing R2VF, a two-step method that transforms nominal features into an ordinal framework and applies variable fusion. The method demonstrates effectiveness in addressing overfitting and identifying appropriate covariates through comparisons with other methods.
Over recent decades, extensive research has aimed to overcome the restrictive underlying assumptions required for a Generalized Linear Model to generate accurate and meaningful predictions. These efforts include regularizing coefficients, selecting features, and clustering ordinal categories, among other approaches. Despite these advances, efficiently clustering nominal categories in GLMs without incurring high computational costs remains a challenge. This paper introduces Ranking to Variable Fusion (R2VF), a two-step method designed to efficiently fuse nominal and ordinal categories in GLMs. By first transforming nominal features into an ordinal framework via regularized regression and then applying variable fusion, R2VF strikes a balance between model complexity and interpretability. We demonstrate the effectiveness of R2VF through comparisons with other methods, highlighting its performance in addressing overfitting and identifying an appropriate set of covariates.