Novel GPU Boruta algorithms for feature selection from high-dimensional data
This work addresses the computational bottleneck of wrapper-based feature selection for large-scale datasets by leveraging GPU parallelism, offering a practical speedup for data scientists.
Two GPU-accelerated versions of the Boruta feature selection algorithm (Boruta-Permut and Boruta-TreeImp) were proposed to improve computational efficiency on high-dimensional data. Experiments showed that the GPU versions greatly improved speed while maintaining accuracy comparable to the original Boruta, though Boruta-TreeImp could overestimate some feature importance.
Most feature selection algorithms, especially wrapper methods, run inefficiently on CPU based platforms because of their high computational complexity. This inefficiency makes them unsuitable for processing large scale datasets. To address this challenge, the present study proposed two GPU accelerated versions of the Boruta feature selection procedure, in which Boruta-Permut relies on permutation based feature importance and Boruta-TreeImp employs importance based on impurity reduction. To evaluate these methods we conducted experiments on both a self constructed dataset and several publicly available datasets. The experimental results show that the proposed GPU accelerated algorithms greatly improve computational efficiency while preserving feature selection accuracy comparable to the original Boruta algorithm. In our analysis we also observe that the impurity reduction based version can overestimate the importance of some features. Overall these findings suggest that performing Boruta feature selection on GPUs offers an effective and cost efficient solution for large scale data analysis, which is a good deal.