SIJan 18, 2023
Temporal Motifs for Financial Networks: A Study on Mercari, JPMC, and Venmo PlatformsPenghang Liu, Bahadir Altun, Rupam Acharyya et al.
Understanding the dynamics of financial transactions among people is critical for various applications such as fraud detection. One important aspect of financial transaction networks is temporality. The order and repetition of transactions can offer new insights when considered within the graph structure. Temporal motifs, defined as a set of nodes that interact with each other in a short time period, are a promising tool in this context. In this work, we study three unique temporal financial networks: transactions in Mercari, an online marketplace, payments in a synthetic network generated by J.P. Morgan Chase, and payments and friendships among Venmo users. We consider the fraud detection problem on the Mercari and J.P. Morgan Chase networks, for which the ground truth is available. We show that temporal motifs offer superior performance to several baselines, including a previous method that considers simple graph features and two node embedding techniques (LINE and node2vec), while being practical in terms of runtime performance. For the Venmo network, we investigate the interplay between financial and social relations on three tasks: friendship prediction, vendor identification, and analysis of temporal cycles. For friendship prediction, temporal motifs yield better results than general heuristics, such as Jaccard and Adamic-Adar measures. We are also able to identify vendors with high accuracy and observe interesting patterns in rare motifs, such as temporal cycles. We believe that the analysis, datasets, and lessons from this work will be beneficial for future research on financial transaction networks.
SIJun 21, 2023
Quantifying Node-based Core ResilienceJakir Hossain, Sucheta Soundarajan, Ahmet Erdem Sarıyüce
Core decomposition is an efficient building block for various graph analysis tasks such as dense subgraph discovery and identifying influential nodes. One crucial weakness of the core decomposition is its sensitivity to changes in the graph: inserting or removing a few edges can drastically change the core structure of a graph. Hence, it is essential to characterize, quantify, and, if possible, improve the resilience of the core structure of a given graph in global and local levels. Previous works mostly considered the core resilience of the entire graph or important subgraphs in it. In this work, we study node-based core resilience measures upon edge removals and insertions. We first show that a previously proposed measure, Core Strength, does not correctly capture the core resilience of a node upon edge removals. Next, we introduce the concept of dependency graph to capture the impact of neighbor nodes (for edge removal) and probable future neighbor nodes (for edge insertion) on the core number of a given node. Accordingly, we define Removal Strength and Insertion Strength measures to capture the resilience of an individual node upon removing and inserting an edge, respectively. As naive computation of those measures is costly, we provide efficient heuristics built on key observations about the core structure. We consider two key applications, finding critical edges and identifying influential spreaders, to demonstrate the usefulness of our new measures on various real-world networks and against several baselines. We also show that our heuristic algorithms are more efficient than the naive approaches.
CRDec 24, 2025
Better Call Graphs: A New Dataset of Function Call Graphs for Malware ClassificationJakir Hossain, Gurvinder Singh, Lukasz Ziarek et al.
Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of graph-based classification methods. To address this gap, we introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique FCGs extracted from recent Android application packages (APKs). BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK. Through extensive experiments using baseline classifiers, we demonstrate the necessity and value of BCG compared to existing datasets. BCG is publicly available at https://erdemub.github.io/BCG-dataset.
LGMar 6
Stochastic Event Prediction via Temporal Motif Transitionsİbrahim Bahadır Altun, Ahmet Erdem Sarıyüce
Networks of timestamped interactions arise across social, financial, and biological domains, where forecasting future events requires modeling both evolving topology and temporal ordering. Temporal link prediction methods typically frame the task as binary classification with negative sampling, discarding the sequential and correlated nature of real-world interactions. We introduce STEP (STochastic Event Predictor), a framework that reformulates temporal link prediction as a sequential forecasting problem in continuous time. STEP models event dynamics through discrete temporal motif transitions governed by Poisson processes, maintaining a set of open motif instances that evolve as new interactions arrive. At each step, the framework decides whether to initiate a new temporal motif or extend an existing one, selecting the most probable event via Bayesian scoring of temporal likelihoods and structural priors. STEP also produces compact, temporal motif-based feature vectors that can be concatenated with existing temporal graph neural network outputs, enriching their representations without architectural modifications. Experiments on five real-world datasets demonstrate up to 21% average precision gains over state-of-the-art baselines in classification and 0.99 precision in next $k$ sequential forecasting, with consistently lower runtime than competing motif-aware methods.
IRMar 5
Core-based Hierarchies for Efficient GraphRAGJakir Hossain, Ahmet Erdem Sarıyüce
Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.
SIJun 16, 2025
Density-aware Walks for Coordinated Campaign DetectionAtul Anand Gopalakrishnan, Jakir Hossain, Tuğrulcan Elmas et al.
Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the problem as a graph classification task. We leverage the recently introduced Large Engagement Networks (LEN) dataset, which contains over 300 networks capturing engagement patterns from both fake and authentic trends on Twitter prior to the 2023 Turkish elections. The graphs in LEN were constructed by collecting interactions related to campaigns that stemmed from ephemeral astroturfing. Established graph neural networks (GNNs) struggle to accurately classify campaign graphs, highlighting the challenges posed by LEN due to the large size of its networks. To address this, we introduce a new graph classification method that leverages the density of local network structures. We propose a random weighted walk (RWW) approach in which node transitions are biased by local density measures such as degree, core number, or truss number. These RWWs are encoded using the Skip-gram model, producing density-aware structural embeddings for the nodes. Training message-passing neural networks (MPNNs) on these density-aware embeddings yields superior results compared to the simpler node features available in the dataset, with nearly a 12\% and 5\% improvement in accuracy for binary and multiclass classification, respectively. Our findings demonstrate that incorporating density-aware structural encoding with MPNNs provides a robust framework for identifying coordinated inauthentic behavior on social media networks such as Twitter.