GTJun 5

Evidence Markets

Safwan Hossain, Gabriel Andrade, Chengqi Zang, Yiling Chen

arXiv:2606.0743413.7

Originality Incremental advance

AI Analysis

For prediction market designers and users, this work addresses the limitations of traditional prediction markets by enabling evidence-based reasoning and endogenous resolution, but the contribution is incremental as it builds on existing market scoring rules.

The paper introduces evidence markets, a generalization of prediction markets that incentivizes submission of evidence alongside beliefs and can be endogenously resolved using crowd-sourced evidence. It proves bounded platform loss, truthful reporting as ε-DSIC, and proposes an LLM-as-a-Judge verification framework.

Modern prediction markets face two limitations that restrict their applicability in a range of settings:~(i)~they reveal what the crowd believes but not the evidence or reasoning behind those beliefs, and~(ii)~they require an event with an external ground truth that resolves at a known future date. We address these twin challenges by introducing evidence markets, a generalization of prediction markets that incentivizes the submission of evidence alongside beliefs and can be endogenously resolved using the crowd-sourced evidence if external resolution is not possible. At its core, the market uses a logarithmic market scoring rule whose liquidity parameter changes dynamically with the accumulated evidence quality. We prove that platform loss is bounded, evidence is rewarded proportional to the current market uncertainty, and can be equivalently implemented through an automated market maker. In the case where the marker resolves endogenously based on submitted evidence, we characterize how withholding evidence shifts a trader's belief about resolution and use it to prove truthful belief and evidence reporting is a always an $\varepsilon$-dominant strategy incentive compatible (DSIC) strategy. To address operational considerations, we propose evidence verification via an LLM-as-a-Judge framework with staking and give an asynchronous execution algorithm that is not bottle-necked by verification. Throughout the work, we use LLM evaluations -- determining which model is best for a given task -- as a salient and representative running example for our proposed market.

View on arXiv PDF

Similar