Baris Coskunuzer

LG
h-index13
11papers
114citations
Novelty53%
AI Score39

11 Papers

LGNov 24, 2022Code
Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT

Cuneyt Gurcan Akcora, Murat Kantarcioglu, Yulia R. Gel et al.

Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit

LGSep 4, 2024Code
Topological Methods in Machine Learning: A Tutorial for Practitioners

Baris Coskunuzer, Cüneyt Gürcan Akçora

Topological Machine Learning (TML) is an emerging field that leverages techniques from algebraic topology to analyze complex data structures in ways that traditional machine learning methods may not capture. This tutorial provides a comprehensive introduction to two key TML techniques, persistent homology and the Mapper algorithm, with an emphasis on practical applications. Persistent homology captures multi-scale topological features such as clusters, loops, and voids, while the Mapper algorithm creates an interpretable graph summarizing high-dimensional data. To enhance accessibility, we adopt a data-centric approach, enabling readers to gain hands-on experience applying these techniques to relevant tasks. We provide step-by-step explanations, implementations, hands-on examples, and case studies to demonstrate how these tools can be applied to real-world problems. The goal is to equip researchers and practitioners with the knowledge and resources to incorporate TML into their work, revealing insights often hidden from conventional machine learning methods. The tutorial code is available at https://github.com/cakcora/TopologyForML

LGNov 7, 2022
ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

Andac Demir, Baris Coskunuzer, Ignacio Segovia-Dominguez et al.

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).

LGOct 15, 2025
T3former: Temporal Graph Classification with Topological Machine Learning

Md. Joshem Uddin, Soham Changani, Baris Coskunuzer

Temporal graph classification plays a critical role in applications such as cybersecurity, brain connectivity analysis, social dynamics, and traffic monitoring. Despite its significance, this problem remains underexplored compared to temporal link prediction or node forecasting. Existing methods often rely on snapshot-based or recurrent architectures that either lose fine-grained temporal information or struggle with long-range dependencies. Moreover, local message-passing approaches suffer from oversmoothing and oversquashing, limiting their ability to capture complex temporal structures. We introduce T3former, a novel Topological Temporal Transformer that leverages sliding-window topological and spectral descriptors as first-class tokens, integrated via a specialized Descriptor-Attention mechanism. This design preserves temporal fidelity, enhances robustness, and enables principled cross-modal fusion without rigid discretization. T3former achieves state-of-the-art performance across multiple benchmarks, including dynamic social networks, brain functional connectivity datasets, and traffic networks. It also offers theoretical guarantees of stability under temporal and structural perturbations. Our results highlight the power of combining topological and spectral insights for advancing the frontier of temporal graph learning.

CVOct 13, 2024
TopOC: Topological Deep Learning for Ovarian and Breast Cancer Diagnosis

Saba Fatema, Brighton Nuwagira, Sayoni Chakraborty et al.

Microscopic examination of slides prepared from tissue samples is the primary tool for detecting and classifying cancerous lesions, a process that is time-consuming and requires the expertise of experienced pathologists. Recent advances in deep learning methods hold significant potential to enhance medical diagnostics and treatment planning by improving accuracy, reproducibility, and speed, thereby reducing clinicians' workloads and turnaround times. However, the necessity for vast amounts of labeled data to train these models remains a major obstacle to the development of effective clinical decision support systems. In this paper, we propose the integration of topological deep learning methods to enhance the accuracy and robustness of existing histopathological image analysis models. Topological data analysis (TDA) offers a unique approach by extracting essential information through the evaluation of topological patterns across different color channels. While deep learning methods capture local information from images, TDA features provide complementary global features. Our experiments on publicly available histopathological datasets demonstrate that the inclusion of topological features significantly improves the differentiation of tumor types in ovarian and breast cancers.

LGJun 14, 2024
MiNT: Multi-Network Training for Transfer Learning on Temporal Graphs

Kiarash Shamsi, Tran Gia Bao Ngo, Razieh Shirzadkhani et al.

Temporal Graph Learning (TGL) has become a robust framework for discovering patterns in dynamic networks and predicting future interactions. While existing research has largely concentrated on learning from individual networks, this study explores the potential of learning from multiple temporal networks and its ability to transfer to unobserved networks. To achieve this, we introduce Temporal Multi-network Training MiNT, a novel pre-training approach that learns from multiple temporal networks. With a novel collection of 84 temporal transaction networks, we pre-train TGL models on up to 64 networks and assess their transferability to 20 unseen networks. Remarkably, MiNT achieves state-of-the-art results in zero-shot inference, surpassing models individually trained on each network. Our findings further demonstrate that increasing the number of pre-training networks significantly improves transfer performance. This work lays the groundwork for developing Temporal Graph Foundation Models, highlighting the significant potential of multi-network pre-training in TGL.

LGJan 24, 2024
EMP: Effective Multidimensional Persistence for Graph Representation Learning

Ignacio Segovia-Dominguez, Yuzhou Chen, Cuneyt G. Akcora et al.

Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.

LGJan 24, 2024
Time-Aware Knowledge Representations of Dynamic Objects with Multidimensional Persistence

Baris Coskunuzer, Ignacio Segovia-Dominguez, Yuzhou Chen et al.

Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of time dimension in knowledge encoding mechanisms for time-dependent data leads to frequent model updates, poor learning performance, and, as a result, subpar decision-making. Here we propose a new approach to a time-aware knowledge representation mechanism that notably focuses on implicit time-dependent topological information along multiple geometric dimensions. In particular, we propose a new approach, named \textit{Temporal MultiPersistence} (TMP), which produces multidimensional topological fingerprints of the data by using the existing single parameter topological summaries. The main idea behind TMP is to merge the two newest directions in topological representation learning, that is, multi-persistence which simultaneously describes data shape evolution along multiple key parameters, and zigzag persistence to enable us to extract the most salient data shape information over time. We derive theoretical guarantees of TMP vectorizations and show its utility, in application to forecasting on benchmark traffic flow, Ethereum blockchain, and electrocardiogram datasets, demonstrating the competitive performance, especially, in scenarios of limited data records. In addition, our TMP method improves the computational efficiency of the state-of-the-art multipersistence summaries up to 59.5 times.

CRMay 18, 2023
Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain

Poupak Azad, Baris Coskunuzer, Murat Kantarcioglu et al.

The rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities, including ransomware payments and transactions on darknet markets. These illegal activities often utilize Bitcoin as the preferred payment method. However, current tools for detecting illicit behavior either rely on a few heuristics and laborious data collection processes or employ computationally inefficient graph neural network (GNN) models that are challenging to interpret. To overcome the computational and interpretability limitations of existing techniques, we introduce an effective solution called Chainlet Orbits. This approach embeds Bitcoin addresses by leveraging their topological characteristics in transactions. By employing our innovative address embedding, we investigate e-crime in Bitcoin networks by focusing on distinctive substructures that arise from illicit behavior. The results of our node classification experiments demonstrate superior performance compared to state-of-the-art methods, including both topological and GNN-based approaches. Moreover, our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.

LGOct 29, 2021
Topological Relational Learning on Graphs

Yuzhou Chen, Baris Coskunuzer, Yulia R. Gel

Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classification tasks demonstrate that the new TRI-GNN outperforms all 14 state-of-the-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10\% better performance under noisy scenarios.

LGApr 10, 2021
Smart Vectorizations for Single and Multiparameter Persistence

Baris Coskunuzer, CUneyt Gurcan Akcora, Ignacio Segovia Dominguez et al.

The machinery of topological data analysis becomes increasingly popular in a broad range of machine learning tasks, ranging from anomaly detection and manifold learning to graph classification. Persistent homology is one of the key approaches here, allowing us to systematically assess the evolution of various hidden patterns in the data as we vary a scale parameter. The extracted patterns, or homological features, along with information on how long such features persist throughout the considered filtration of a scale parameter, convey a critical insight into salient data characteristics and data organization. In this work, we introduce two new and easily interpretable topological summaries for single and multi-parameter persistence, namely, saw functions and multi-persistence grid functions, respectively. Compared to the existing topological summaries which tend to assess the numbers of topological features and/or their lifespans at a given filtration step, our proposed saw and multi-persistence grid functions allow us to explicitly account for essential complementary information such as the numbers of births and deaths at each filtration step. These new topological summaries can be regarded as the complexity measures of the evolving subspaces determined by the filtration and are of particular utility for applications of persistent homology on graphs. We derive theoretical guarantees on the stability of the new saw and multi-persistence grid functions and illustrate their applicability for graph classification tasks.