Practical considerations for variable screening in the super learner
This work addresses a practical issue for data analysts using ensemble methods, but it is incremental as it extends existing guidance on algorithm diversity to variable screening.
The study tackled the problem of variable screening within the super learner ensemble, showing that relying solely on the lasso for dimension reduction can lead to poor performance in cases where it is known to be ineffective, and recommended using a diverse set of screeners to mitigate this issue, as illustrated with HIV-1 antibody data.
Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms (screeners), including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a super learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screeners should be used to protect against poor performance of any one screener, similar to the guidance for choosing a library of prediction algorithms for the super learner. These results are further illustrated through the analysis of HIV-1 antibody data.