SIJan 18, 2023
Temporal Motifs for Financial Networks: A Study on Mercari, JPMC, and Venmo PlatformsPenghang Liu, Bahadir Altun, Rupam Acharyya et al.
Understanding the dynamics of financial transactions among people is critical for various applications such as fraud detection. One important aspect of financial transaction networks is temporality. The order and repetition of transactions can offer new insights when considered within the graph structure. Temporal motifs, defined as a set of nodes that interact with each other in a short time period, are a promising tool in this context. In this work, we study three unique temporal financial networks: transactions in Mercari, an online marketplace, payments in a synthetic network generated by J.P. Morgan Chase, and payments and friendships among Venmo users. We consider the fraud detection problem on the Mercari and J.P. Morgan Chase networks, for which the ground truth is available. We show that temporal motifs offer superior performance to several baselines, including a previous method that considers simple graph features and two node embedding techniques (LINE and node2vec), while being practical in terms of runtime performance. For the Venmo network, we investigate the interplay between financial and social relations on three tasks: friendship prediction, vendor identification, and analysis of temporal cycles. For friendship prediction, temporal motifs yield better results than general heuristics, such as Jaccard and Adamic-Adar measures. We are also able to identify vendors with high accuracy and observe interesting patterns in rare motifs, such as temporal cycles. We believe that the analysis, datasets, and lessons from this work will be beneficial for future research on financial transaction networks.
SIJun 19, 2023
Using Motif Transitions for Temporal Graph GenerationPenghang Liu, A. Erdem Sarıyüce
Graph generative models are highly important for sharing surrogate data and benchmarking purposes. Real-world complex systems often exhibit dynamic nature, where the interactions among nodes change over time in the form of a temporal network. Most temporal network generation models extend the static graph generation models by incorporating temporality in the generation process. More recently, temporal motifs are used to generate temporal networks with better success. However, existing models are often restricted to a small set of predefined motif patterns due to the high computational cost of counting temporal motifs. In this work, we develop a practical temporal graph generator, Motif Transition Model (MTM), to generate synthetic temporal networks with realistic global and local features. Our key idea is modeling the arrival of new events as temporal motif transition processes. We first calculate the transition properties from the input graph and then simulate the motif transition processes based on the transition probabilities and transition rates. We demonstrate that our model consistently outperforms the baselines with respect to preserving various global and local temporal graph statistics and runtime performance.
AIOct 16, 2022
Limited or Biased: Modeling Sub-Rational Human Investors in Financial MarketsPenghang Liu, Kshama Dwarakanath, Svitlana S Vyetrenko et al.
Human decision-making in real-life deviates significantly from the optimal decisions made by fully rational agents, primarily due to computational limitations or psychological biases. While existing studies in behavioral finance have discovered various aspects of human sub-rationality, there lacks a comprehensive framework to transfer these findings into an adaptive human model applicable across diverse financial market scenarios. In this study, we introduce a flexible model that incorporates five different aspects of human sub-rationality using reinforcement learning. Our model is trained using a high-fidelity multi-agent market simulator, which overcomes limitations associated with the scarcity of labeled data of individual investors. We evaluate the behavior of sub-rational human investors using hand-crafted market scenarios and SHAP value analysis, showing that our model accurately reproduces the observations in the previous studies and reveals insights of the driving factors of human behavior. Finally, we explore the impact of sub-rationality on the investor's Profit and Loss (PnL) and market quality. Our experiments reveal that bounded-rational and prospect-biased human behaviors improve liquidity but diminish price efficiency, whereas human behavior influenced by myopia, optimism, and pessimism reduces market liquidity.
58.8LGApr 6
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time SeriesAnnita Vapsi, Penghang Liu, Saheed Obitayo et al.
Synthetic data is essential for training foundation models for time series (FMTS), but most generators assume static correlations, and are typically missing realistic inter-channel dependencies. We introduce DynLMC, a Dynamic Linear Model of Coregionalization, that incorporates time-varying, regime-switching correlations and cross-channel lag structures. Our approach produces synthetic multivariate time series with correlation dynamics that closely resemble real data. Fine-tuning three foundational models on DynLMC-generated data yields consistent zero-shot forecasting improvements across nine benchmarks. Our results demonstrate that modeling dynamic inter-channel correlations enhances FMTS transferability, highlighting the importance of data-centric pretraining.
LGNov 1, 2025
Privacy-Aware Time Series Synthesis via Public Knowledge DistillationPenghang Liu, Haibei Zhu, Eleonora Kreacic et al.
Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
AIFeb 13, 2024
LLM-driven Imitation of Subrational Behavior : Illusion or Reality?Andrea Coletta, Kshama Dwarakanath, Penghang Liu et al.
Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.
CVJun 28, 2025
MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language ModelsJian Chen, Wenye Ma, Penghang Liu et al.
Multimodal Large Language Models (MLLMs) have achieved remarkable visual reasoning abilities in natural images, text-rich documents, and graphic designs. However, their ability to interpret music sheets remains underexplored. To bridge this gap, we introduce MusiXQA, the first comprehensive dataset for evaluating and advancing MLLMs in music sheet understanding. MusiXQA features high-quality synthetic music sheets generated via MusiXTeX, with structured annotations covering note pitch and duration, chords, clefs, key/time signatures, and text, enabling diverse visual QA tasks. Through extensive evaluations, we reveal significant limitations of current state-of-the-art MLLMs in this domain. Beyond benchmarking, we developed Phi-3-MusiX, an MLLM fine-tuned on our dataset, achieving significant performance gains over GPT-based methods. The proposed dataset and model establish a foundation for future advances in MLLMs for music sheet understanding. Code, data, and model will be released upon acceptance.
AIOct 8, 2025
TS-Agent: A Time Series Reasoning Agent with Iterative Statistical Insight GatheringPenghang Liu, Elizabeth Fons, Svitlana Vyetrenko et al.
Large language models (LLMs) have shown strong abilities in reasoning and problem solving, but recent studies reveal that they still struggle with time series reasoning tasks, where outputs are often affected by hallucination or knowledge leakage. In this work we propose TS-Agent, a time series reasoning agent that leverages LLMs strictly for what they excel at, i.e., gathering evidence and synthesizing it into conclusions through step-by-step reasoning, while delegating the extraction of statistical and structural information to time series analytical tools. Instead of mapping time series into text tokens, images, or embeddings, our agent interacts with raw numeric sequences through atomic operators, records outputs in an explicit evidence log, and iteratively refines its reasoning under the guidance of a self-critic and a final quality gate. This design avoids multi-modal alignment training, preserves the native form of time series, ensures interpretability and verifiability, and mitigates knowledge leakage or hallucination. Empirically, we evaluate the agent on established benchmarks. Our experiments show that TS-Agent achieves performance comparable to state-of-the-art LLMs on understanding benchmarks, and delivers significant improvements on reasoning tasks, where existing models often rely on memorization and fail in zero-shot settings.