Principled Non-Linear Feature Selection
This addresses the scalability problem in feature selection for researchers and practitioners working with large datasets, though it appears incremental as it builds on existing KTA-based methods.
The paper tackles the computational inefficiency of non-linear feature selection methods for large datasets by proposing randSel, a randomized algorithm with strong theoretical guarantees. Experimental results show competitive performance, including a 3rd place finish in the ICML black box learning challenge.
Recent non-linear feature selection approaches employing greedy optimisation of Centred Kernel Target Alignment(KTA) exhibit strong results in terms of generalisation accuracy and sparsity. However, they are computationally prohibitive for large datasets. We propose randSel, a randomised feature selection algorithm, with attractive scaling properties. Our theoretical analysis of randSel provides strong probabilistic guarantees for correct identification of relevant features. RandSel's characteristics make it an ideal candidate for identifying informative learned representations. We've conducted experimentation to establish the performance of this approach, and present encouraging results, including a 3rd position result in the recent ICML black box learning challenge as well as competitive results for signal peptide prediction, an important problem in bioinformatics.