Siguang Li

3papers

1citation

Novelty57%

AI Score44

Ranked #74,510 of 201,326 authors (top 37%)#4,534 in AI (top 32%)

3 Papers

61.2LGApr 22

Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]

Huaiyu Jia, Luofeng Zhou, Wentao Zhang et al.

Prediction markets are markets for trading claims on future events, such as presidential elections, and their prices provide continuously updated signals of collective beliefs. In decentralized platforms such as Polymarket, the market lifecycle spans market creation, token registration, trading, oracle interaction, dispute, and final settlement, yet the corresponding data are fragmented across heterogeneous off-chain and on-chain sources. We present the first continuously maintained dataset suite for the full lifecycle of decentralized prediction markets, built on Polymarket. To address the challenges of large-scale cross-source integration, incomplete linkage, and continuous synchronization, we build a unified relational data system that integrates three canonical layers: market metadata, fill-level trading records, and oracle-resolution events, through identifier resolution, on-chain recovery, and incremental updates. The resulting dataset spans October 2020 to March 2026 and comprises more than 770 thousand market records, over 943 million fill records, and nearly 2 million oracle events. We describe the data model, collection pipeline, and consistency mechanisms that make the dataset reproducible and extensible, and we demonstrate its utility through descriptive analyses of market activity and two downstream case studies: NBA outcome calibration and CPI expectation reconstruction.

22.6PFApr 3

The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic Consequences

Yiyue Cao, Mingzhe Zheng, Lin William Cong et al.

Modern blockchain ecosystems comprise many heterogeneous networks, creating a growing need for interoperability. Cross-chain bridges provide the core infrastructure for this interoperability by enabling verifiable state transitions that move assets and liquidity across chains. While prior work has focused mainly on bridge design and security, the system-level and economic consequences of cross-chain liquidity interoperability remain less understood. We present a large-scale empirical measurement study of cross-chain interoperability using a dataset spanning 20 blockchains and 16 major bridge protocols from 2022 to 2025. We model the multi-chain ecosystem as a time-varying weighted hypergraph and introduce two complementary metrics. Structural interoperability captures connectivity created by deployed bridge infrastructure, reflecting bridge coverage and redundancy independent of user behavior. Active interoperability captures realized cross-chain usage, measured by normalized transfer activity. This decomposition separates infrastructure capacity from actual utilization and yields several findings. The cross-chain network evolves from a sparse hub-and-spoke structure into a denser multi-hub core led by EVM-compatible chains. Bridge expansion and chain growth are uneven: some chains achieve broad structural access but limited realized usage, whereas others concentrate activity through a small set of routes. Overall, interoperability provision and interoperability use diverge substantially, showing that connectivity alone does not imply economically meaningful integration. These results provide a measurement framework for understanding how cross-chain infrastructure reshapes blockchain market structure and liquidity organization.

AIMar 8

Machine Learning for Stress Testing: Uncertainty Decomposition in Causal Panel Prediction

Yu Wang, Xiangchen Liu, Siguang Li

Regulatory stress testing requires projecting credit losses under hypothetical macroeconomic scenarios -- a fundamentally causal question typically treated as a prediction problem. We propose a framework for policy-path counterfactual inference in panels that transparently separates what can be learned from data from what requires assumptions about confounding. Our approach has four components: (i) observational identification of path-conditional means via iterated regression, enabling continuous macro-path contrasts without requiring a control group; (ii) causal set identification under bounded confounding, yielding sharp identified sets with interpretable breakdown values that communicate robustness in a single number; (iii) an oracle inequality showing that recursive rollout error is governed by a horizon-dependent amplification factor, providing a concrete answer to how far ahead one can reliably predict under stress; and (iv) importance-weighted conformal calibration bands with diagnostics that quantify extrapolation cost and trigger abstention when coverage guarantees degrade. The final output is a three-layer uncertainty decomposition that cleanly separates estimation uncertainty from confounding uncertainty. We validate all results through simulation and semi-synthetic experiments with real unemployment data, including a Covid retrospective demonstrating the framework's diagnostic value under extreme scenarios.