Ryan Wails

h-index6

2papers

105citations

2 Papers

4.9CLNov 1, 2025

Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios

Autumn Toney-Wails, Ryan Wails

Reliable uncertainty quantification (UQ) is essential for ensuring trustworthy downstream use of large language models, especially when they are deployed in decision-support and other knowledge-intensive applications. Model certainty can be estimated from token logits, with derived probability and entropy values offering insight into performance on the prompt task. However, this approach may be inadequate for probabilistic scenarios, where the probabilities of token outputs are expected to align with the theoretical probabilities of the possible outcomes. We investigate the relationship between token certainty and alignment with theoretical probability distributions in well-defined probabilistic scenarios. Using GPT-4.1 and DeepSeek-Chat, we evaluate model responses to ten prompts involving probability (e.g., roll a six-sided die), both with and without explicit probability cues in the prompt (e.g., roll a fair six-sided die). We measure two dimensions: (1) response validity with respect to scenario constraints, and (2) alignment between token-level output probabilities and theoretical probabilities. Our results indicate that, while both models achieve perfect in-domain response accuracy across all prompt scenarios, their token-level probability and entropy values consistently diverge from the corresponding theoretical distributions.

9.6CRJan 5, 2018Code

Tempest: Temporal Dynamics in Anonymity Systems

Ryan Wails, Yixin Sun, Aaron Johnson et al.

Many recent proposals for anonymous communication omit from their security analyses a consideration of the effects of time on important system components. In practice, many components of anonymity systems, such as the client location and network structure, exhibit changes and patterns over time. In this paper, we focus on the effect of such temporal dynamics on the security of anonymity networks. We present Tempest, a suite of novel attacks based on (1) client mobility, (2) usage patterns, and (3) changes in the underlying network routing. Using experimental analysis on real-world datasets, we demonstrate that these temporal attacks degrade user privacy across a wide range of anonymity networks, including deployed systems such as Tor; path-selection protocols for Tor such as DeNASA, TAPS, and Counter-RAPTOR; and network-layer anonymity protocols for Internet routing such as Dovetail and HORNET. The degradation is in some cases surprisingly severe. For example, a single host failure or network route change could quickly and with high certainty identify the client's ISP to a malicious host or ISP. The adversary behind each attack is relatively weak - generally passive and in control of one network location or a small number of hosts. Our findings suggest that designers of anonymity systems should rigorously consider the impact of temporal dynamics when analyzing anonymity.