Recovering Imbalanced Clusters via Gradient-Based Projection Pursuit
This addresses a specific challenge in exploratory data analysis for researchers dealing with imbalanced data distributions, though it appears to be an incremental improvement on existing projection pursuit techniques.
The paper tackles the problem of recovering imbalanced clusters in projection pursuit by proposing a gradient-based optimization method, showing that imbalanced clusters can be recovered more easily than balanced ones and demonstrating improved performance on real-world datasets with limited samples.
Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm's sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method's applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.