45.5ATJun 4
Computing Projective Implicit Representations from Poset TowersTamal K. Dey, Florian Russold
A family of simplicial complexes connected by simplicial maps and indexed by a finite poset $P$ is called a poset tower. Poset towers subsume multi-parameter filtrations, zigzag filtrations, and one-parameter simplicial towers, while allowing arbitrary finite posets and simplicial maps. The homology of a poset tower is a $P$-persistence module. To compute it globally over $P$, we consider the chain complex segment of $P$-persistence modules $C_{\ell-1}\xleftarrow{\partial_{\ell}}C_\ell \xleftarrow{\partial_{\ell+1}}C_{\ell+1}$ induced by the simplices of the tower. Unlike in one-critical multi-filtrations, the chain modules $C_\ell$ need not be projective and may have a complicated structure. We address the problem of replacing this segment by projective modules and $P$-graded matrices while preserving homology. The resulting projective implicit representation (PiRep) plays the role of the graded boundary-matrix representation in the classical persistence algorithm: it converts simplicial data into algebraic input on which persistent homology can be computed globally over $P$. In particular, a PiRep can be used as input to algorithms for computing minimal presentations of persistent homology. We give an efficient algorithm to compute a PiRep from a poset tower. It constructs degreewise minimal presentations and asymptotically minimal second terms of projective resolutions of the chain modules $C_\ell$, lifts the boundary maps $\partial_\ell$ to these resolutions, and assembles the resulting data into a PiRep using an additional correction term. The method is tailored to chain complexes induced by poset towers and computes the required algebraic data combinatorially, exploiting their special structure and avoiding general-purpose algebraic reduction. In the context of poset towers, it is fully general and can serve as a foundation for efficient algorithms on specific posets.
LGApr 11, 2023Code
GRIL: A $2$-parameter Persistence Based Vectorization for Machine LearningCheng Xin, Soham Mukherjee, Shreyas N. Samaga et al.
$1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at https://github.com/soham0209/mpml-graph.
LGJun 1, 2022
Topological Deep Learning: Going Beyond Graph DataMustafa Hajij, Ghada Zamzmi, Theodore Papamarkou et al.
Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains. Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces. Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail. Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications.
LGJul 28, 2022
Topological structure of complex predictionsMeng Liu, Tamal K. Dey, David F. Gleich
Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into pictures representing a topological view. The result is a map of the predictions that enables inspection. The methods scale up to large datasets across different domains and enable us to detect labeling errors in training data, understand generalization in image classification, and inspect predictions of likely pathogenic mutations in the BRCA1 gene.
53.7CGApr 5
A Fast Algorithm for Computing Zigzag RepresentativesTamal K. Dey, Tao Hou, Dmitriy Morozov
Zigzag filtrations of simplicial complexes generalize the usual filtrations by allowing simplex deletions in addition to simplex insertions. The barcodes computed from zigzag filtrations encode the evolution of homological features. Although one can locate a particular feature at any index in the filtration using existing algorithms, the resulting representatives may not be compatible with the zigzag: a representative cycle at one index may not map into a representative cycle at its neighbor. For this, one needs to compute compatible representative cycles along each bar in the barcode. It is known that the barcode for a zigzag filtration with $m$ insertions and deletions can be computed in $O(m^Ï)$ time, where $Ï< 2.373$ is the matrix multiplication exponent. However, it is not known how to compute the compatible representatives so efficiently. For a non-zigzag filtration, the classical matrix-based algorithm provides representatives in $O(m^3)$ time, which can be improved to $O(m^Ï)$. However, no known algorithm for zigzag filtrations computes the representatives with the $O(m^3)$ time bound. We present an $O(m^2n)$ time algorithm for this problem, where $n\leq m$ is the size of the largest complex in the filtration.
LGFeb 17, 2024
Expressive Higher-Order Link Prediction through Hypergraph Symmetry BreakingSimon Zhang, Cheng Xin, Tamal K. Dey
A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher-order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph representation learners, are bounded in expressive power by the Generalized Weisfeiler Lehman-1 (GWL-1) algorithm, a generalization of the Weisfeiler Lehman-1 algorithm. However, GWL-1 has limited expressive power. In fact, induced subhypergraphs with identical GWL-1 valued nodes are indistinguishable. Furthermore, message passing on hypergraphs can already be computationally expensive, especially on GPU memory. To address these limitations, we devise a preprocessing algorithm that can identify certain regular subhypergraphs exhibiting symmetry. Our preprocessing algorithm runs once with complexity the size of the input hypergraph. During training, we randomly replace subhypergraphs identified by the algorithm with covering hyperedges to break symmetry. We show that our method improves the expressivity of GWL-1. Our extensive experiments also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation.
LGFeb 22, 2025
Quasi Zigzag Persistence: A Topological Framework for Analyzing Time-Varying DataTamal K. Dey, Shreyas N. Samaga
In this paper, we propose Quasi Zigzag Persistent Homology (QZPH) as a framework for analyzing time-varying data by integrating multiparameter persistence and zigzag persistence. To this end, we introduce a stable topological invariant that captures both static and dynamic features at different scales. We present an algorithm to compute this invariant efficiently. We show that it enhances the machine learning models when applied to tasks such as sleep-stage detection, demonstrating its effectiveness in capturing the evolving patterns in time-varying datasets.
CLJan 4
HalluZig: Hallucination Detection using Zigzag PersistenceShreyas N. Samaga, Gilberto Gonzalez Arroyo, Tamal K. Dey
The factual reliability of Large Language Models (LLMs) remains a critical barrier to their adoption in high-stakes domains due to their propensity to hallucinate. Current detection methods often rely on surface-level signals from the model's output, overlooking the failures that occur within the model's internal reasoning process. In this paper, we introduce a new paradigm for hallucination detection by analyzing the dynamic topology of the evolution of model's layer-wise attention. We model the sequence of attention matrices as a zigzag graph filtration and use zigzag persistence, a tool from Topological Data Analysis, to extract a topological signature. Our core hypothesis is that factual and hallucinated generations exhibit distinct topological signatures. We validate our framework, HalluZig, on multiple benchmarks, demonstrating that it outperforms strong baselines. Furthermore, our analysis reveals that these topological signatures are generalizable across different models and hallucination detection is possible only using structural signatures from partial network depth.
LGJun 11, 2024
D-GRIL: End-to-End Topological Learning with 2-parameter PersistenceSoham Mukherjee, Shreyas N. Samaga, Cheng Xin et al.
End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on standard benchmark graph datasets. Further, we exhibit that this framework can be applied in the context of bio-activity prediction in drug discovery.
LGJun 4, 2024
GEFL: Extended Filtration Learning for Graph ClassificationSimon Zhang, Soham Mukherjee, Tamal K. Dey
Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods.
CVSep 15, 2019
Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological MethodsTamal K. Dey, Jiayuan Wang, Yusu Wang
Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically. Of course, this requires a robust algorithm that can reconstruct the road networks from satellite images reliably so that the output can be fed as training samples. In this work, we develop such a technique by infusing a persistence-guided discrete Morse based graph reconstruction algorithm into ML framework. We elucidate our contributions in two phases. First, in a semi-automatic framework, we combine a discrete-Morse based graph reconstruction algorithm with an existing CNN framework to segment input satellite images. We show that this leads to reconstructions with better connectivity and less noise. Next, in a fully automatic framework, we leverage the power of the discrete-Morse based graph reconstruction algorithm to train a CNN from a collection of images without labelled data and use the same algorithm to produce the final output from the segmented images created by the trained CNN. We apply the discrete-Morse based graph reconstruction algorithm iteratively to improve the accuracy of the CNN. We show promising experimental results of this new framework on datasets from SpaceNet Challenge.