Persistence Bag-of-Words for Topological Data Analysis
This addresses a bottleneck for researchers and practitioners in machine learning and topological data analysis by enabling efficient use of persistence diagrams.
The paper tackles the challenge of integrating complex persistence diagrams from topological data analysis into machine learning workflows by introducing a stable vectorized representation called persistence bag-of-words, which achieves state-of-the-art performance with significantly reduced computational time.
Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.