AI HCJan 28, 2025

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers

arXiv:2501.17310v47.81 citationsh-index: 7EMNLP

Originality Incremental advance

AI Analysis

This work addresses the underexplored area of guesstimation in LLMs, offering a practical decoding strategy for real-world estimation tasks, though it is incremental as it builds on known aggregation techniques.

The authors tackled the problem of improving large language models' performance on guesstimation tasks by introducing Wisdom of Crowds decoding, which uses median aggregation of sampled responses to enhance accuracy over existing methods like greedy and self-consistency decoding.

Guesstimation -- the task of making approximate quantitative estimates about objects or events -- is a common real-world skill, yet remains underexplored in large language model (LLM) research. We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e.g., how many marbles fit in a cup) to abstract predictions (e.g., the 2024 U.S. presidential election). Inspired by the social science concept of Wisdom of Crowds (WOC)- where the median of multiple estimates improves accuracy-we propose WOC decoding for LLMs. We replicate WOC effects in human participants and find that LLMs exhibit similar benefits: median aggregation across sampled responses consistently improves accuracy over greedy decoding, self-consistency decoding, and mean decoding. This suggests that LLMs encode a world model that supports approximate reasoning. Our results position guesstimation as a useful probe of LLM world knowledge and highlight WOC decoding as a strategy for enhancing LLM guesstimation performance on real-world tasks.

View on arXiv PDF

Similar