Universal Feature Selection for Simultaneous Interpretability of Multitask Datasets
This addresses the problem of scalable and interpretable feature selection for researchers in scientific domains, offering a method that enables knowledge transfer between datasets, though it appears incremental as it builds on existing feature selection approaches.
The paper tackles the challenge of extracting meaningful features from complex, high-dimensional datasets by introducing BoUTS, a scalable feature selection algorithm that identifies universal and task-specific features, achieving state-of-the-art sparsity while maintaining comparable prediction accuracy on seven chemical regression datasets.
Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations to identify both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and suggest deep connections in seemingly-disparate chemical datasets. We expect these results to have important repercussions in manually-guided inverse problems. Beyond its current application, BoUTS holds immense potential for elucidating data-poor systems by leveraging information from similar data-rich systems. BoUTS represents a significant leap in cross-domain feature selection, potentially leading to advancements in various scientific fields.