Lin William Cong

LG
Semantic Scholar Profile
h-index10
10papers
85citations
Novelty44%
AI Score51

10 Papers

GTMar 22
Inequality in the Age of Pseudonymity

Aviv Yaish, Nir Chemaya, Dahlia Malkhi et al.

Inequality measures such as the Gini coefficient are used to inform and motivate policymaking, and are increasingly applied to digital platforms. We analyze how measures fare in pseudonymous settings that are common in the digital age. One key challenge of such environments is the ability of actors to create fake identities under fictitious false names, also known as ``Sybils.'' While some actors may do so to preserve their privacy, we show that this can hamper inequality measurements: it is impossible for measures satisfying the literature's canonical set of desired properties to assess the inequality of an economy that may harbor Sybils. We characterize the class of all Sybil-proof measures, and prove that they must satisfy relaxed version of the aforementioned properties. Furthermore, we show that the structure imposed restricts the ability to assess inequality at a fine-grained level. We then apply our results to prove that popular measures are not Sybil-proof, with the famous Gini coefficient being but one example out of many. Finally, we examine dynamics leading to the creation of Sybils in digital and traditional settings.

LGMay 18
Privacy Preserving Reinforcement Learning with One-Sided Feedback

Lin William Cong, Guangyan Gan, Hanzhang Qin et al.

We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound that matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is the optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.

GNMay 1
Trust Dynamics in Cryptocurrency Markets: Centralized vs. Decentralized Exchanges

Xintong Wu, Wanlin Deng, Yutong Quan et al.

Trust mechanisms diverge between centralized and decentralized exchanges, representing distinct sociotechnical governance paradigms. However, quantifying trust dynamics and their redistribution between these architectures remains empirically challenging, limiting understanding of how institutional shocks affect market behavior. The FTX collapse offers a natural experiment to bridge this gap. Through an interdisciplinary approach combining causal inference and computational text analysis, we find significant price declines and capital reallocation from centralized to decentralized exchanges following the event. While sentiment metrics showed no sharp discontinuities, topic modeling and network analysis of Discord communities reveal that seasonal holiday discourse obscured underlying trust concerns in centralized exchange forums. These findings underscore the fragility of institutional trust architectures and demonstrate how mixed methods can illuminate behavioral patterns during systemic crises, offering insights for exchange risk management and regulatory assessment.

LGApr 22
Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]

Huaiyu Jia, Luofeng Zhou, Wentao Zhang et al.

Prediction markets are markets for trading claims on future events, such as presidential elections, and their prices provide continuously updated signals of collective beliefs. In decentralized platforms such as Polymarket, the market lifecycle spans market creation, token registration, trading, oracle interaction, dispute, and final settlement, yet the corresponding data are fragmented across heterogeneous off-chain and on-chain sources. We present the first continuously maintained dataset suite for the full lifecycle of decentralized prediction markets, built on Polymarket. To address the challenges of large-scale cross-source integration, incomplete linkage, and continuous synchronization, we build a unified relational data system that integrates three canonical layers: market metadata, fill-level trading records, and oracle-resolution events, through identifier resolution, on-chain recovery, and incremental updates. The resulting dataset spans October 2020 to March 2026 and comprises more than 770 thousand market records, over 943 million fill records, and nearly 2 million oracle events. We describe the data model, collection pipeline, and consistency mechanisms that make the dataset reproducible and extensible, and we demonstrate its utility through descriptive analyses of market activity and two downstream case studies: NBA outcome calibration and CPI expectation reconstruction.

GNFeb 10
Behavioral Economics of AI: LLM Biases and Corrections

Pietro Bini, Lin William Cong, Xing Huang et al.

Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? If so, how can these biases be mitigated? Drawing on the cognitive psychology and experimental economics literatures, we conduct the most comprehensive set of experiments to date$-$originally designed to document human biases$-$on prominent LLM families across model versions and scales. We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses. Prompting LLMs to make rational decisions reduces biases.

LGJan 28, 2025
Growing the Efficient Frontier on Panel Trees

Lin William Cong, Guanhao Feng, Jingyu He et al.

We introduce a new class of tree-based models, P-Trees, for analyzing (unbalanced) panel of individual asset returns, generalizing high-dimensional sorting with economic guidance and interpretability. Under the mean-variance efficient framework, P-Trees construct test assets that significantly advance the efficient frontier compared to commonly used test assets, with alphas unexplained by benchmark pricing models. P-Tree tangency portfolios also constitute traded factors, recovering the pricing kernel and outperforming popular observable and latent factor models for investments and cross-sectional pricing. Finally, P-Trees capture the complexity of asset returns with sparsity, achieving out-of-sample Sharpe ratios close to those attained only by over-parameterized large models.

PFApr 3
The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic Consequences

Yiyue Cao, Mingzhe Zheng, Lin William Cong et al.

Modern blockchain ecosystems comprise many heterogeneous networks, creating a growing need for interoperability. Cross-chain bridges provide the core infrastructure for this interoperability by enabling verifiable state transitions that move assets and liquidity across chains. While prior work has focused mainly on bridge design and security, the system-level and economic consequences of cross-chain liquidity interoperability remain less understood. We present a large-scale empirical measurement study of cross-chain interoperability using a dataset spanning 20 blockchains and 16 major bridge protocols from 2022 to 2025. We model the multi-chain ecosystem as a time-varying weighted hypergraph and introduce two complementary metrics. Structural interoperability captures connectivity created by deployed bridge infrastructure, reflecting bridge coverage and redundancy independent of user behavior. Active interoperability captures realized cross-chain usage, measured by normalized transfer activity. This decomposition separates infrastructure capacity from actual utilization and yields several findings. The cross-chain network evolves from a sparse hub-and-spoke structure into a denser multi-hub core led by EVM-compatible chains. Bridge expansion and chain growth are uneven: some chains achieve broad structural access but limited realized usage, whereas others concentrate activity through a small set of routes. Overall, interoperability provision and interoperability use diverge substantially, showing that connectivity alone does not imply economically meaningful integration. These results provide a measurement framework for understanding how cross-chain infrastructure reshapes blockchain market structure and liquidity organization.

MLApr 1
Bridging Structured Knowledge and Data: A Unified Framework with Finance Applications

Yi Cao, Zexun Chen, Lin William Cong et al.

We develop Structured-Knowledge-Informed Neural Networks (SKINNs), a unified estimation framework that embeds theoretical, simulated, previously learned, or cross-domain insights as differentiable constraints within flexible neural function approximation. SKINNs jointly estimate neural network parameters and economically meaningful structural parameters in a single optimization problem, enforcing theoretical consistency not only on observed data but over a broader input domain through collocation, and therefore nesting approaches such as functional GMM, Bayesian updating, transfer learning, PINNs, and surrogate modeling. SKINNs define a class of M-estimators that are consistent and asymptotically normal with root-N convergence, sandwich covariance, and recovery of pseudo-true parameters under misspecification. We establish identification of structural parameters under joint flexibility, derive generalization and target-risk bounds under distributional shift in a convex proxy, and provide a restricted-optimal characterization of the weighting parameter that governs the bias-variance tradeoff. In an illustrative financial application to option pricing, SKINNs improve out-of-sample valuation and hedging performance, particularly at longer horizons and during high-volatility regimes, while recovering economically interpretable structural parameters with improved stability relative to conventional calibration. More broadly, SKINNs provide a general econometric framework for combining model-based reasoning with high-dimensional, data-driven estimation.

LGAug 20, 2021
Deep Sequence Modeling: Development and Applications in Asset Pricing

Lin William Cong, Ke Tang, Jingyuan Wang et al.

We predict asset returns and measure risk premia using a prominent technique from artificial intelligence -- deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this paper, we first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. We then perform a comparative analysis of these methods using data on U.S. equities. We demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence, and that Long- and Short-term Memory (LSTM) based models tend to have the best out-of-sample performance.

CRMay 15, 2020
Blockchain Architecture forAuditing Automation and TrustBuilding in Public Markets

Sean Cao, Lin William Cong, Meng Han et al.

Business transactions by public firms are required to be reported, verified, and audited periodically, which is traditionally a labor-intensive and time-consuming process. To streamline this procedure, we design FutureAB (Future Auditing Blockchain) which aims to automate the reporting and auditing process, thereby allowing auditors to focus on discretionary accounts to better detect and prevent fraud. We demonstrate how distributed-ledger technologies build investor trust and disrupt the auditing industry. Our multi-functional design indicates that auditing firms can automate transaction verification without the need for a trusted third party by collaborating and sharing their information while preserving data privacy (commitment scheme) and security (immutability). We also explore how smart contracts and wallets facilitate the computerization and implementation of our system on Ethereum. Finally, performance evaluation reveals the efficacy and scalability of FutureAB in terms of both encryption (0.012 seconds per transaction) and verification (0.001 seconds per transaction).