Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction
This work addresses practical problems in social network analysis for researchers and practitioners, though it is incremental in improving existing methods.
The paper tackles three limitations in information cascade prediction—temporal leakage in evaluation, feature-poor datasets, and computational inefficiency—by proposing a time-ordered splitting strategy, introducing the Taoke dataset with purchase conversions, and developing the lightweight CasTemp framework. CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup and excels at predicting second-stage popularity conversions.
Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.