Mustafa Hajij

LG
h-index33
34papers
636citations
Novelty46%
AI Score46

34 Papers

LGSep 26, 2023Code
ICML 2023 Topological Deep Learning Challenge : Design and Results

Mathilde Papillon, Mustafa Hajij, Helen Jenne et al.

This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two-month duration. This paper describes the design of the challenge and summarizes its main findings.

LGJun 1, 2022
Topological Deep Learning: Going Beyond Graph Data

Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou et al.

Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains. Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces. Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail. Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications.

LGApr 20, 2023
Architectures of Topological Deep Learning: A Survey of Message-Passing Topological Neural Networks

Mathilde Papillon, Sophia Sanborn, Mustafa Hajij et al.

The natural world is full of complex systems characterized by intricate relations between their components: from social interactions between individuals in a social network to electrostatic interactions between atoms in a protein. Topological Deep Learning (TDL) provides a comprehensive framework to process and extract knowledge from data associated with these systems, such as predicting the social community to which an individual belongs or predicting whether a protein can be a reasonable target for drug development. TDL has demonstrated theoretical and practical advantages that hold the promise of breaking ground in the applied sciences and beyond. However, the rapid growth of the TDL literature for relational systems has also led to a lack of unification in notation and language across message-passing Topological Neural Network (TNN) architectures. This presents a real obstacle for building upon existing works and for deploying message-passing TNNs to new real-world problems. To address this issue, we provide an accessible introduction to TDL for relational systems, and compare the recently published message-passing TNNs using a unified mathematical and graphical notation. Through an intuitive and critical review of the emerging field of TDL, we extract valuable insights into current challenges and exciting opportunities for future development.

LGSep 8, 2024
ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain

Guillermo Bernárdez, Lev Telyatnikov, Marco Montagna et al.

This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.

CVFeb 24, 2025Code
CalibRefine: Deep Learning-Based Online Automatic Targetless LiDAR-Camera Calibration with Iterative and Attention-Driven Post-Refinement

Lei Cheng, Lihao Guo, Tianya Zhang et al.

Accurate multi-sensor calibration is essential for deploying robust perception systems in applications such as autonomous driving and intelligent transportation. Existing LiDAR-camera calibration methods often rely on manually placed targets, preliminary parameter estimates, or intensive data preprocessing, limiting their scalability and adaptability in real-world settings. In this work, we propose a fully automatic, targetless, and online calibration framework, CalibRefine, which directly processes raw LiDAR point clouds and camera images. Our approach is divided into four stages: (1) a Common Feature Discriminator that leverages relative spatial positions, visual appearance embeddings, and semantic class cues to identify and generate reliable LiDAR-camera correspondences, (2) a coarse homography-based calibration that uses the matched feature correspondences to estimate an initial transformation between the LiDAR and camera frames, serving as the foundation for further refinement, (3) an iterative refinement to incrementally improve alignment as additional data frames become available, and (4) an attention-based refinement that addresses non-planar distortions by leveraging a Vision Transformer and cross-attention mechanisms. Extensive experiments on two urban traffic datasets demonstrate that CalibRefine achieves high-precision calibration with minimal human input, outperforming state-of-the-art targetless methods and matching or surpassing manually tuned baselines. Our results show that robust object-level feature matching, combined with iterative refinement and self-supervised attention-based refinement, enables reliable sensor alignment in complex real-world conditions without ground-truth matrices or elaborate preprocessing. Code is available at https://github.com/radar-lab/Lidar_Camera_Automatic_Calibration

LGMay 11
TopoU-Net: a U-Net architecture for topological domains

Gaurav Gaurav, Ibrahem ALJabea, Yaroslav Zakomornyy et al.

Many modern datasets mix points, edges, regions, groups, objects, events, hyperedges, and relations. Yet neural architectures often force such data into grids, graphs, or sequences, obscuring higher-order structure and making encoder-decoder designs domain-specific. We view U-Net not as a grid-specific architecture, but as a hierarchical encoder-decoder principle: representation spaces, transport maps between levels, and skip connections between matched levels. Combinatorial complexes naturally supply these ingredients through cells, incidences, and ranks. We introduce TopoU-Net, a rank-path U-Net for topological domains. Given a path from an input rank to a bottleneck rank and back, the encoder lifts cochains upward along incidence maps, the decoder transports them downward, and skip connections merge features at matched ranks. Rank replaces spatial scale: choosing paths through nodes, edges, faces, hyperedges, or global cells becomes the central architectural decision. A key quantity is the bottleneck support ratio, the number of cells at the bottleneck relative to the number of cells at the input rank. This ratio is fixed by the complex and chosen path rather than by arbitrary pooling, and it clarifies when skip connections are optional, useful, or structurally important. Across node classification, graph classification, hypergraph node classification, mesh classification, and image reconstruction, TopoU-Net provides a reusable encoder-decoder template for higher-order structured data. Among the evaluated baselines, it achieves the strongest mean accuracy on six of eight node-classification datasets and four of five hypergraph datasets, with the largest gains on heterophilic graphs. Ablations show that removing skip connections is most damaging under severe bottleneck compression.

LGJun 9, 2024Code
TopoBench: A Framework for Benchmarking Topological Deep Learning

Lev Telyatnikov, Guillermo Bernardez, Marco Montagna et al.

This work introduces TopoBench, an open-source library designed to standardize benchmarking and accelerate research in topological deep learning (TDL). TopoBench decomposes TDL into a sequence of independent modules for data generation, loading, transforming and processing, as well as model training, optimization and evaluation. This modular organization provides flexibility for modifications and facilitates the adaptation and optimization of various TDL pipelines. A key feature of TopoBench is its support for transformations and lifting across topological domains. Mapping the topology and features of a graph to higher-order topological domains, such as simplicial and cell complexes, enables richer data representations and more fine-grained analyses. The applicability of TopoBench is demonstrated by benchmarking several TDL architectures across diverse tasks and datasets.

LGFeb 14, 2024
Position: Topological Deep Learning is the New Frontier for Relational Learning

Theodore Papamarkou, Tolga Birdal, Michael Bronstein et al.

Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.

LGFeb 4, 2024
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains

Mustafa Hajij, Mathilde Papillon, Florian Frantzen et al.

We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://pyt-team.github.io/}{https://pyt-team.github.io/.

LGDec 15, 2023
Combinatorial Complexes: Bridging the Gap Between Cell Complexes and Hypergraphs

Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou et al.

Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or other cell complexes. In this context, cell complexes are often seen as a subclass of hypergraphs with additional algebraic structure that can be exploited, e.g., to develop a spectral theory. In this article, we promote an alternative perspective. We argue that hypergraphs and cell complexes emphasize \emph{different} types of relations, which may have different utility depending on the application context. Whereas hypergraphs are effective in modeling set-type, multi-body relations between entities, cell complexes provide an effective means to model hierarchical, interior-to-boundary type relations. We discuss the relative advantages of these two choices and elaborate on the previously introduced concept of a combinatorial complex that enables co-existing set-type and hierarchical relations. Finally, we provide a brief numerical experiment to demonstrate that this modelling flexibility can be advantageous in learning tasks.

LGDec 19, 2023
Topo-MLP : A Simplicial Network Without Message Passing

Karthikeyan Natesan Ramamurthy, Aldo Guzmán-Sáenz, Mustafa Hajij

Due to their ability to model meaningful higher order relations among a set of entities, higher order network models have emerged recently as a powerful alternative for graph-based network models which are only capable of modeling binary relationships. Message passing paradigm is still dominantly used to learn representations even for higher order network models. While powerful, message passing can have disadvantages during inference, particularly when the higher order connectivity information is missing or corrupted. To overcome such limitations, we propose Topo-MLP, a purely MLP-based simplicial neural network algorithm to learn the representation of elements in a simplicial complex without explicitly relying on message passing. Our framework utilizes a novel Higher Order Neighborhood Contrastive (HONC) loss which implicitly incorporates the simplicial structure into representation learning. Our proposed model's simplicity makes it faster during inference. Moreover, we show that our model is robust when faced with missing or corrupted connectivity structure.

LGMay 23, 2024
Attending to Topological Spaces: The Cellular Transformer

Rubén Ballester, Pablo Hernández-García, Mathilde Papillon et al.

Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. First, we propose a new formulation of the usual self- and cross-attention mechanisms, tailored to leverage incidence relations in cell complexes, e.g., edge-face and node-edge relations. Additionally, we propose a set of topological positional encodings specifically designed for cell complexes. By transforming three graph datasets into cell complex datasets, our experiments reveal that CT not only achieves state-of-the-art performance, but it does so without the need for more complex enhancements such as virtual nodes, in-domain structural encodings, or graph rewiring.

LGSep 4, 2025
Topotein: Topological Deep Learning for Protein Representation Learning

Zhiyu Wang, Arian Jamasb, Mustafa Hajij et al.

Protein representation learning (PRL) is crucial for understanding structure-function relationships, yet current sequence- and graph-based methods fail to capture the hierarchical organization inherent in protein structures. We introduce Topotein, a comprehensive framework that applies topological deep learning to PRL through the novel Protein Combinatorial Complex (PCC) and Topology-Complete Perceptron Network (TCPNet). Our PCC represents proteins at multiple hierarchical levels -- from residues to secondary structures to complete proteins -- while preserving geometric information at each level. TCPNet employs SE(3)-equivariant message passing across these hierarchical structures, enabling more effective capture of multi-scale structural patterns. Through extensive experiments on four PRL tasks, TCPNet consistently outperforms state-of-the-art geometric graph neural networks. Our approach demonstrates particular strength in tasks such as fold classification which require understanding of secondary structure arrangements, validating the importance of hierarchical topological features for protein analysis.

LGMay 27, 2025
Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework

Mustafa Hajij, Lennart Bastian, Sarah Osentoski et al.

We introduce copresheaf topological neural networks (CTNNs), a powerful unifying framework that encapsulates a wide spectrum of deep learning architectures, designed to operate on structured data, including images, point clouds, graphs, meshes, and topological manifolds. While deep learning has profoundly impacted domains ranging from digital assistants to autonomous systems, the principled design of neural architectures tailored to specific tasks and data types remains one of the field's most persistent open challenges. CTNNs address this gap by formulating model design in the language of copresheaves, a concept from algebraic topology that generalizes most practical deep learning models in use today. This abstract yet constructive formulation yields a rich design space from which theoretically sound and practically effective solutions can be derived to tackle core challenges in representation learning, such as long-range dependencies, oversmoothing, heterophily, and non-Euclidean domains. Our empirical results on structured data benchmarks demonstrate that CTNNs consistently outperform conventional baselines, particularly in tasks requiring hierarchical or localized sensitivity. These results establish CTNNs as a principled multi-scale foundation for the next generation of deep learning architectures.

LGOct 11, 2021
Signal Processing on Cell Complexes

T. Mitchell Roddenberry, Michael T. Schaub, Mustafa Hajij

The processing of signals supported on non-Euclidean domains has attracted large interest recently. Thus far, such non-Euclidean domains have been abstracted primarily as graphs with signals supported on the nodes, though the processing of signals on more general structures such as simplicial complexes has also been considered. In this paper, we give an introduction to signal processing on (abstract) regular cell complexes, which provide a unifying framework encompassing graphs, simplicial complexes, cubical complexes and various meshes as special cases. We discuss how appropriate Hodge Laplacians for these cell complexes can be derived. These Hodge Laplacians enable the construction of convolutional filters, which can be employed in linear filtering and non-linear filtering via neural networks defined on cell complexes.

LGOct 6, 2021
Data-Centric AI Requires Rethinking Data Notion

Mustafa Hajij, Ghada Zamzmi, Karthikeyan Natesan Ramamurthy et al.

The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.

LGMar 6, 2021
Simplicial Complex Representation Learning

Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou et al.

Simplicial complexes form an important class of topological spaces that are frequently used in many application areas such as computer-aided design, computer graphics, and simulation. Representation learning on graphs, which are just 1-d simplicial complexes, has witnessed a great attention in recent years. However, there has not been enough effort to extend representation learning to higher dimensional simplicial objects due to the additional complexity these objects hold, especially when it comes to entire-simplicial complex representation learning. In this work, we propose a method for simplicial complex-level representation learning that embeds a simplicial complex to a universal embedding space in a way that complex-to-complex proximity is preserved. Our method uses our novel geometric message passing schemes to learn an entire simplicial complex representation in an end-to-end fashion. We demonstrate the proposed model on publicly available mesh dataset. To the best of our knowledge, this work presents the first method for learning simplicial complex-level representation.

LGFeb 25, 2021
Persistent Homology and Graphs Representation Learning

Mustafa Hajij, Ghada Zamzmi, Xuanting Cai

This article aims to study the topological invariant properties encoded in node graph representational embeddings by utilizing tools available in persistent homology. Specifically, given a node embedding representation algorithm, we consider the case when these embeddings are real-valued. By viewing these embeddings as scalar functions on a domain of interest, we can utilize the tools available in persistent homology to study the topological information encoded in these representations. Our construction effectively defines a unique persistence-based graph descriptor, on both the graph and node levels, for every node representation algorithm. To demonstrate the effectiveness of the proposed method, we study the topological descriptors induced by DeepWalk, Node2Vec and Diff2Vec.

LGFeb 16, 2021
Topological Deep Learning: Classification Neural Networks

Mustafa Hajij, Kyle Istvan

Topological deep learning is a formalism that is aimed at introducing topological language to deep learning for the purpose of utilizing the minimal mathematical structures to formalize problems that arise in a generic deep learning problem. This is the first of a sequence of articles with the purpose of introducing and studying this formalism. In this article, we define and study the classification problem in machine learning in a topological setting. Using this topological framework, we show when the classification problem is possible or not possible in the context of neural networks. Finally, we demonstrate how our topological setting immediately illuminates aspects of this problem that are not as readily apparent using traditional tools.

CVJan 21, 2021
TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection in Chest X-Ray Images

Mustafa Hajij, Ghada Zamzmi, Fawwaz Batayneh

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data such as connected components and holes and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on a vast array of data applications, images in particular. To capture the characteristics of both powerful tools, we propose \textit{TDA-Net}, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed \textit{TDA-Net} to a critical application, which is the automated detection of COVID-19 from CXR images. The experimental results showed that the proposed network achieved excellent performance and suggests the applicability of our method in practice.

LGDec 2, 2020
Algebraically-Informed Deep Networks (AIDN): A Deep Learning Approach to Represent Algebraic Structures

Mustafa Hajij, Ghada Zamzmi, Matthew Dawson et al.

One of the central problems in the interface of deep learning and mathematics is that of building learning systems that can automatically uncover underlying mathematical laws from observed data. In this work, we make one step towards building a bridge between algebraic structures and deep learning, and introduce \textbf{AIDN}, \textit{Algebraically-Informed Deep Networks}. \textbf{AIDN} is a deep learning algorithm to represent any finitely-presented algebraic object with a set of deep neural networks. The deep networks obtained via \textbf{AIDN} are \textit{algebraically-informed} in the sense that they satisfy the algebraic relations of the presentation of the algebraic structure that serves as the input to the algorithm. Our proposed network can robustly compute linear and non-linear representations of most finitely-presented algebraic structures such as groups, associative algebras, and Lie algebras. We evaluate our proposed approach and demonstrate its applicability to algebraic and geometric objects that are significant in low-dimensional topology. In particular, we study solutions for the Yang-Baxter equations and their applications on braid groups. Further, we study the representations of the Temperley-Lieb algebra. Finally, we show, using the Reshetikhin-Turaev construction, how our proposed deep learning approach can be utilized to construct new link invariants. We believe the proposed approach would tread a path toward a promising future research in deep learning applied to algebraic and geometric structures.

LGOct 2, 2020
Cell Complex Neural Networks

Mustafa Hajij, Kyle Istvan, Ghada Zamzmi

Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. They also provide a combinatorial formalism that allows the inclusion of complicated relationships of restrictive structures such as graphs and meshes. In this paper, we propose \textbf{Cell Complexes Neural Networks (CXNs)}, a general, combinatorial and unifying construction for performing neural network-type computations on cell complexes. We introduce an inter-cellular message passing scheme on cell complexes that takes the topology of the underlying space into account and generalizes message passing scheme to graphs. Finally, we introduce a unified cell complex encoder-decoder framework that enables learning representation of cells for a given complex inside the Euclidean spaces. In particular, we show how our cell complex autoencoder construction can give, in the special case \textbf{cell2vec}, a generalization for node2vec.

LGAug 31, 2020
A Topological Framework for Deep Learning

Mustafa Hajij, Kyle Istvan

We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.

LGMay 10, 2020
PageRank and The K-Means Clustering Algorithm

Mustafa Hajij, Eyad Said, Robert Todd

We utilize the PageRank vector to generalize the $k$-means clustering algorithm to directed and undirected graphs. We demonstrate that PageRank and other centrality measures can be used in our setting to robustly compute centrality of nodes in a given graph. Furthermore, we show how our method can be generalized to metric spaces and apply it to other domains such as point clouds and triangulated meshes

CGFeb 12, 2020
Fast and Scalable Complex Network Descriptor Using PageRank and Persistent Homology

Mustafa Hajij, Elizabeth Munch, Paul Rosen

The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this article, we use the PageRank function along with persistent homology to obtain a scalable graph descriptor and utilize it to compare the similarities between graphs. For a given graph $G(V,E)$, our descriptor can be computed in $O(|E|α(|V|))$, where $α$ is the inverse Ackermann function which makes it scalable and computable on massive graphs. We show the effectiveness of our method by utilizing it on multiple shape mesh datasets.

GTDec 20, 2019
Big Data Approaches to Knot Theory: Understanding the Structure of the Jones Polynomial

Jesse S F Levitt, Mustafa Hajij, Radmila Sazdanovic

We examine the structure and dimensionality of the Jones polynomial using manifold learning techniques. Our data set consists of more than 10 million knots up to 17 crossings and two other special families up to 2001 crossings. We introduce and describe a method for using filtrations to analyze infinite data sets where representative sampling is impossible or impractical, an essential requirement for working with knots and the data from knot invariants. In particular, this method provides a new approach for analyzing knot invariants using Principal Component Analysis. Using this approach on the Jones polynomial data we find that it can be viewed as an approximately 3 dimensional manifold, that this description is surprisingly stable with respect to the filtration by the crossing number, and that the results suggest further structures to be examined and understood.

HCJun 22, 2019
TopoLines: Topological Smoothing for Line Charts

Paul Rosen, Ashley Suh, Christopher Salgado et al.

Line charts are commonly used to visualize a series of data values. When the data are noisy, smoothing is applied to make the signal more apparent. Conventional methods used to smooth line charts, e.g., using subsampling or filters, such as median, Gaussian, or low-pass, each optimize for different properties of the data. The properties generally do not include retaining peaks (i.e., local minima and maxima) in the data, which is an important feature for certain visual analytics tasks. We present TopoLines, a method for smoothing line charts using techniques from Topological Data Analysis. The design goal of TopoLines is to maintain prominent peaks in the data while minimizing any residual error. We evaluate TopoLines for 2 visual analytics tasks by comparing to 5 popular line smoothing methods with data from 4 application domains.

MLApr 21, 2019
Mesh Learning Using Persistent Homology on the Laplacian Eigenfunctions

Yunhao Zhang, Haowen Liu, Paul Rosen et al.

We use persistent homology along with the eigenfunctions of the Laplacian to study similarity amongst triangulated 2-manifolds. Our method relies on studying the lower-star filtration induced by the eigenfunctions of the Laplacian. This gives us a shape descriptor that inherits the rich information encoded in the eigenfunctions of the Laplacian. Moreover, the similarity between these descriptors can be easily computed using tools that are readily available in Topological Data Analysis. We provide experiments to illustrate the effectiveness of the proposed method.

CYNov 5, 2018
Integrating Project Spatial Coordinates into Pavement Management Prioritization

Shadi Hanandeh, Omar Elbagalati, Mustafa Hajij

To date, pavement management software products and studies on optimizing the prioritization of pavement maintenance and rehabilitation (M&R) have been mainly focused on three parameters; the pre-treatment pavement condition, the rehabilitation cost, and the available budget. Yet, the role of the candidate projects' spatial characteristics in the decision-making process has not been deeply considered. Such a limitation, predominately, allows the recommended M&R projects' schedule to involve simultaneously running but spatially scattered construction sites, which are very challenging to monitor and manage. This study introduces a novel approach to integrate pavement segments' spatial coordinates into the M&R prioritization analysis. The introduced approach aims at combining the pavement segments with converged spatial coordinates to be repaired in the same timeframe without compromising the allocated budget levels or the overall target Pavement Condition Index (PCI). Such a combination would result in minimizing the routing of crews, materials and other equipment among the construction sites and would provide better collaborations and communications between the pavement maintenance teams. Proposed herein is a novel spatial clustering algorithm that automatically finds the projects within a certain budget and spatial constrains. The developed algorithm was successfully validated using 1,800 pavement maintenance projects from two real-life examples of the City of Milton, GA and the City of Tyler, TX.

CGOct 18, 2018
An Efficient Data Retrieval Parallel Reeb Graph Algorithm

Mustafa Hajij, Paul Rosen

The Reeb graph of a scalar function defined on a domain gives a topologically meaningful summary of that domain. Reeb graphs have been shown in the past decade to be of great importance in geometric processing, image processing, computer graphics, and computational topology. The demand for analyzing large data sets has increased in the last decade. Hence the parallelization of topological computations needs to be more fully considered. We propose a parallel augmented Reeb graph algorithm on triangulated meshes with and without a boundary. That is, in addition to our parallel algorithm for computing a Reeb graph, we describe a method for extracting the original manifold data from the Reeb graph structure. We demonstrate the running time of our algorithm on standard datasets. As an application, we show how our algorithm can be utilized in mesh segmentation algorithms.

SIApr 3, 2018
Homology-Preserving Multi-Scale Graph Skeletonization Using Mapper on Graphs

Paul Rosen, Mustafa Hajij, Bei Wang

Node-link diagrams are a popular method for representing graphs that capture relationships between individuals, businesses, proteins, and telecommunication endpoints. However, node-link diagrams may fail to convey insights regarding graph structures, even for moderately sized data of a few hundred nodes, due to visual clutter. We propose to apply the mapper construction -- a popular tool in topological data analysis -- to graph visualization, which provides a strong theoretical basis for summarizing the data while preserving their core structures. We develop a variation of the mapper construction targeting weighted, undirected graphs, called {\mog}, which generates homology-preserving skeletons of graphs. We further show how the adjustment of a single parameter enables multi-scale skeletonization of the input graph. We provide a software tool that enables interactive explorations of such skeletons and demonstrate the effectiveness of our method for synthetic and real-world data.

MLJan 18, 2018
Graph Based Analysis for Gene Segment Organization In a Scrambled Genome

Mustafa Hajij, Nataša Jonoska, Denys Kukushkin et al.

DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph properties we associate a point in a higher dimensional Euclidean space to each graph such that cluster formations and analysis can be performed with methods from topological data analysis. The method is applied to a recently sequenced model organism \textit{Oxytricha trifallax}, a species of ciliate with highly scrambled genome that undergoes massive rearrangement process after conjugation. The analysis shows some emerging star-like graph structures indicating that segments of a single gene can interleave, or even contain all of the segments from fifteen or more other genes in between its segments. We also observe that as many as six genes can have their segments mutually interleaving or overlapping.

CVDec 11, 2017
Parallel Mapper

Mustafa Hajij, Basem Assiri, Paul Rosen

The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method.

CVOct 24, 2017
The Shape of an Image: A Study of Mapper on Images

Alejandro Robles, Mustafa Hajij, Paul Rosen

We study the topological construction called Mapper in the context of simply connected domains, in particular on images. The Mapper construction can be considered as a generalization for contour, split, and joint trees on simply connected domains. A contour tree on an image domain assumes the height function to be a piecewise linear Morse function. This is a rather restrictive class of functions and does not allow us to explore the topology for most real world images. The Mapper construction avoids this limitation by assuming only continuity on the height function allowing this construction to robustly deal with a significant larger set of images. We provide a customized construction for Mapper on images, give a fast algorithm to compute it, and show how to simplify the Mapper structure in this case. Finally, we provide a simple procedure that guarantees the equivalence of Mapper to contour, join, and split trees on a simply connected domain.