CLJul 8, 2024
Large Language Model Recall Uncertainty is Modulated by the Fan EffectJesse Roberts, Kyle Moore, Thao Pham et al.
This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan effect. Our results show that removing uncertainty disrupts the observed effect. The experiments suggest the fan effect is consistent whether the fan value is induced in-context or in the pre-training data. Finally, these findings provide in-silico evidence that fan effects and typicality are expressions of the same phenomena.
CLAug 16, 2024
Chain of Thought Still Thinks Fast: APriCoT Helps with Thinking SlowKyle Moore, Jesse Roberts, Thao Pham et al.
Language models are known to absorb biases from their training data, leading to predictions driven by statistical regularities rather than semantic relevance. We investigate the impact of these biases on answer choice preferences in the Massive Multi-Task Language Understanding (MMLU) task. Our findings show that these biases are predictive of model preference and mirror human test-taking strategies even when chain of thought (CoT) reasoning is used. To address this issue, we introduce Counterfactual Prompting with Agnostically Primed CoT (APriCoT). We demonstrate that while Counterfactual Prompting with CoT alone is insufficient to mitigate bias, APriCoT effectively reduces the influence of base-rate probabilities while improving overall accuracy. Our results suggest that mitigating bias requires a slow thinking process which CoT alone may not provide as it tends to reinforce fast thinking model bias under some prompting methodologies. APriCoT is a step toward developing more robust and fair language models that can think slow.
CLOct 11, 2025
Scheming Ability in LLM-to-LLM Strategic InteractionsThao Pham
As large language model (LLM) agents are deployed autonomously in diverse contexts, evaluating their capacity for strategic deception becomes crucial. While recent research has examined how AI systems scheme against human developers, LLM-to-LLM scheming remains underexplored. We investigate the scheming ability and propensity of frontier LLM agents through two game-theoretic frameworks: a Cheap Talk signaling game and a Peer Evaluation adversarial game. Testing four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b), we measure scheming performance with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning. When prompted, most models, especially Gemini-2.5-pro and Claude-3.7-Sonnet, achieved near-perfect performance. Critically, models exhibited significant scheming propensity without prompting: all models chose deception over confession in Peer Evaluation (100% rate), while models choosing to scheme in Cheap Talk succeeded at 95-100% rates. These findings highlight the need for robust evaluations using high-stakes game-theoretic scenarios in multi-agent settings.
CLJun 17, 2024
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark PerformanceKyle Moore, Jesse Roberts, Thao Pham et al.
Cloze testing is a common method for measuring the behavior of large language models on a number of benchmark tasks. Using the MMLU dataset, we show that the base-rate probability (BRP) differences across answer tokens are significant and affect task performance ie. guess A if uncertain. We find that counterfactual prompting does sufficiently mitigate the BRP effect. The BRP effect is found to have a similar effect to test taking strategies employed by humans leading to the conflation of task performance and test-taking ability. We propose the Nvr-X-MMLU task, a variation of MMLU, which helps to disambiguate test-taking ability from task performance and reports the latter.
CVJul 2, 2020
Domain Adaptation for Robust Workload Level Alignment Between Sessions and Subjects using fNIRSBoyang Lyu, Thao Pham, Giles Blaney et al.
Significance: We demonstrated the potential of using domain adaptation on functional Near-Infrared Spectroscopy (fNIRS) data to classify different levels of n-back tasks that involve working memory. Aim: Domain shift in fNIRS data is a challenge in the workload level alignment across different experiment sessions and subjects. In order to address this problem, two domain adaptation approaches -- Gromov-Wasserstein (G-W) and Fused Gromov-Wasserstein (FG-W) were used. Approach: Specifically, we used labeled data from one session or one subject to classify trials in another session (within the same subject) or another subject. We applied G-W for session-by-session alignment and FG-W for subject-by-subject alignment to fNIRS data acquired during different n-back task levels. We compared these approaches with three supervised methods: multi-class Support Vector Machine (SVM), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN). Results: In a sample of six subjects, G-W resulted in an alignment accuracy of 68 $\pm$ 4 % (weighted mean $\pm$ standard error) for session-by-session alignment, FG-W resulted in an alignment accuracy of 55 $\pm$ 2 % for subject-by-subject alignment. In each of these cases, 25 % accuracy represents chance. Alignment accuracy results from both G-W and FG-W are significantly greater than those from SVM, CNN and RNN. We also showed that removal of motion artifacts from the fNIRS data plays an important role in improving alignment performance. Conclusions: Domain adaptation has potential for session-by-session and subject-by-subject alignment of mental workload by using fNIRS data.