Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)
This addresses feature selection for biometric authentication data, but it is incremental as it builds on existing screening methods with specific advantages.
The paper tackled the problem of feature selection for ultrahigh-dimensional data with thousands of classes, such as in biometric authentication, by proposing RFMS, which uses random forest-based multiround screening to achieve performance on par with industry-standard methods.
In recent years, numerous screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features; however, most of these features cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods while simultaneously possessing many advantages over these methods.