LGAICLFeb 16, 2025

Learning to Reason from Feedback at Test-Time

arXiv:2502.15771v212 citationsh-index: 14ACL
Originality Highly original
AI Analysis

This addresses the problem of inefficient feedback use in LLMs for reasoning tasks, offering a scalable solution that improves performance over existing methods.

The paper tackles the challenge of large language models (LLMs) struggling with complex tasks in single attempts by introducing FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test-time, and OpTune, a learnable test-time optimizer, achieving superior scalability and performance on four reasoning datasets.

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes