LG AI CLFeb 16, 2025

Learning to Reason from Feedback at Test-Time

arXiv:2502.15771v212 citationsh-index: 14ACL

Originality Highly original

AI Analysis

This addresses the problem of inefficient feedback use in LLMs for reasoning tasks, offering a scalable solution that improves performance over existing methods.

The paper tackles the challenge of large language models (LLMs) struggling with complex tasks in single attempts by introducing FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test-time, and OpTune, a learnable test-time optimizer, achieving superior scalability and performance on four reasoning datasets.

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

View on arXiv PDF

Similar