SparCA: Sparse Compressed Agglomeration for Feature Extraction and Dimensionality Reduction
This addresses the need for generalizable and interpretable dimensionality reduction methods across various data types, though it appears incremental as it builds on existing feature grouping and selection techniques.
The authors tackled the problem of dimensionality reduction requiring task-specific hyperparameter tuning, which limits generalizability, by proposing SparCA, a method that produces interpretable features and shows strong performance on downstream supervised learning tasks across diverse datasets without hyperparameter tuning.
The most effective dimensionality reduction procedures produce interpretable features from the raw input space while also providing good performance for downstream supervised learning tasks. For many methods, this requires optimizing one or more hyperparameters for a specific task, which can limit generalizability. In this study we propose sparse compressed agglomeration (SparCA), a novel dimensionality reduction procedure that involves a multistep hierarchical feature grouping, compression, and feature selection process. We demonstrate the characteristics and performance of the SparCA method across heterogenous synthetic and real-world datasets, including images, natural language, and single cell gene expression data. Our results show that SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream supervised learning tasks without the need for hyperparameter tuning.