LGCLFeb 27, 2025

Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text-Attributed Graphs

arXiv:2502.19651v23 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the challenge of suboptimal performance in dynamic graph analysis for applications like social networks or recommendation systems, representing an incremental advance by focusing on overlooked modalities.

The paper tackles the problem of link prediction on dynamic text-attributed graphs by proposing MoMent, a multi-modal model that explicitly models, integrates, and aligns temporal, textual, and structural modalities, achieving up to 17.28% accuracy improvement and up to 31x speed-up across seven datasets.

Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal events (edges) alongside rich textual attributes. Existing studies can be broadly categorized into TGNN-driven and LLM-driven approaches, both of which encode textual attributes and temporal structures for DyTAG representation. We observe that DyTAGs inherently comprise three distinct modalities: temporal, textual, and structural, often exhibiting completely disjoint distributions. However, the first two modalities are largely overlooked by existing studies, leading to suboptimal performance. To address this, we propose MoMent, a multi-modal model that explicitly models, integrates, and aligns each modality to learn node representations for link prediction. Given the disjoint nature of the original modality distributions, we first construct modality-specific features and encode them using individual encoders to capture correlations across temporal patterns, semantic context, and local structures. Each encoder generates modality-specific tokens, which are then fused into comprehensive node representations with a theoretical guarantee. To avoid disjoint subspaces of these heterogeneous modalities, we propose a dual-domain alignment loss that first aligns their distributions globally and then fine-tunes coherence at the instance level. This enhances coherent representations from temporal, textual, and structural views. Extensive experiments across seven datasets show that MoMent achieves up to 17.28% accuracy improvement and up to 31x speed-up against eight baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes