CL AI IRSep 23, 2025

Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering

Alireza Salemi, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Zhuowan Li, Spurthi Amba Hombaiah, Weize Kong, Tao Chen, Hamed Zamani, Michael Bendersky

DeepMindGeorgia Tech

arXiv:2509.19094v18.34 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses the challenge of adapting QA systems to individual user needs, which is essential for improving accuracy and satisfaction, though it is an incremental advance in a relatively underexplored area.

The paper tackled the problem of personalized question answering by proposing Pathways of Thoughts (PoT), an inference-stage method that enables large language models to explore multiple reasoning trajectories and aggregate responses based on user preferences, achieving up to a 13.1% relative improvement on the LaMP-QA benchmark.

Personalization is essential for adapting question answering (QA) systems to user-specific information needs, thereby improving both accuracy and user satisfaction. However, personalized QA remains relatively underexplored due to challenges such as inferring preferences from long, noisy, and implicit contexts, and generating responses that are simultaneously correct, contextually appropriate, and aligned with user expectations and background knowledge. To address these challenges, we propose Pathways of Thoughts (PoT), an inference-stage method that applies to any large language model (LLM) without requiring task-specific fine-tuning. The approach models the reasoning of an LLM as an iterative decision process, where the model dynamically selects among cognitive operations such as reasoning, revision, personalization, and clarification. This enables exploration of multiple reasoning trajectories, producing diverse candidate responses that capture different perspectives. PoT then aggregates and reweights these candidates according to inferred user preferences, yielding a final personalized response that benefits from the complementary strengths of diverse reasoning paths. Experiments on the LaMP-QA benchmark for personalized QA show that PoT consistently outperforms competitive baselines, achieving up to a 13.1% relative improvement. Human evaluation corroborates these results, with annotators preferring outputs from PoT in 66% of cases and reporting ties in only 15% of cases.

View on arXiv PDF

Similar