CL IT LGFeb 25, 2025

What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions?

Yizhe Zhang, Richard Bai, Zijin Gu, Ruixiang Zhang, Jiatao Gu, Emmanuel Abbe, Samy Bengio, Navdeep Jaitly

Apple

arXiv:2502.18435v36.72 citationsh-index: 91Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing inductive biases in LLMs for knowledge extraction and reasoning tasks, offering incremental improvements in specific domains.

The paper investigates whether alternative factorizations, specifically right-to-left (R2L) training, improve language model performance on multiple-choice questions, finding that R2L models significantly outperform left-to-right models across various benchmarks, with gains in logical reasoning, commonsense understanding, and truthfulness tasks.

Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that R2L models can significantly outperform L2R models on several MCQ benchmarks, including logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability, and directional conditional entropy. We analyze the impact of these factors through controlled simulation studies using arithmetic tasks, where the impacting factors can be better disentangled. Our work demonstrates that exploring alternative factorizations of the text distribution can lead to improvements in LLM capabilities and provides theoretical insights into optimal factorization towards approximating human language distribution, and when each reasoning order might be more advantageous. Our code and checkpoints are released at https://github.com/apple/ml-reversal-blessing.

View on arXiv PDF Code

Similar