Dory: Overcoming Barriers to Computing Persistent Homology
This addresses a bottleneck for researchers in topological data analysis by enabling PH on large-scale data, with incremental improvements in efficiency.
The paper tackles the computational limitations of persistent homology (PH) by introducing Dory, an efficient and scalable algorithm that reduces memory usage and computation time, enabling PH on data sets with millions of points, as demonstrated by analyzing the human genome to show topology changes upon auxin treatment.
Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of large data sets. Dory uses significantly less memory than published algorithms and also provides significant reductions in the computation time compared to most algorithms. It scales to process data sets with millions of points. As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set. Results show that the topology of the human genome changes significantly upon treatment with auxin, a molecule that degrades cohesin, corroborating the hypothesis that cohesin plays a crucial role in loop formation in DNA.