Prototype Selection Using Topological Data Analysis
This work addresses the need for efficient and interpretable prototype selection in machine learning, offering practical tools for classification, though it appears incremental as it builds on existing topological methods.
The paper tackles the problem of selecting representative subsets from large datasets by proposing a topological data analysis-based framework called Topological Prototype Selector (TPS), which significantly preserves or improves classification performance while substantially reducing data size in simulated and real data settings.
Recently, there has been an explosion in statistical learning literature to represent data using topological principles to capture structure and relationships. We propose a topological data analysis (TDA)-based framework, named Topological Prototype Selector (TPS), for selecting representative subsets (prototypes) from large datasets. We demonstrate the effectiveness of TPS on simulated data under different data intrinsic characteristics, and compare TPS against other currently used prototype selection methods in real data settings. In all simulated and real data settings, TPS significantly preserves or improves classification performance while substantially reducing data size. These contributions advance both algorithmic and geometric aspects of prototype learning and offer practical tools for parallelized, interpretable, and efficient classification.