CLAILGApr 25, 2025

One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning

arXiv:2504.18246v21 citationsh-index: 1Has CodeIJCNLP-AACL
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for researchers and practitioners fine-tuning LLMs on multi-turn reasoning tasks, offering an incremental improvement in efficiency.

The paper tackles the inefficiency of fine-tuning Large Language Models on multi-turn reasoning datasets, which typically require multiple forward passes, by proposing a method that duplicates response tokens and uses a custom attention mask to enable single-pass processing. This approach reduces time complexity from O(N^3) to O(N^2) while preserving accuracy, achieving significant training speedup.

Fine-tuning Large Language Models (LLMs) on multi-turn reasoning datasets requires N (number of turns) separate forward passes per conversation due to reasoning token visibility constraints, as reasoning tokens for a turn are discarded in subsequent turns. We propose duplicating response tokens along with a custom attention mask to enable single-pass processing of entire conversations. We prove our method produces identical losses to the N-pass approach while reducing time complexity from $O\bigl(N^{3}\bigl)$ to $O\bigl(N^{2}\bigl)$ and maintaining the same memory complexity for a transformer based model. Our approach achieves significant training speedup while preserving accuracy. Our implementation is available online (https://github.com/devrev/One-Pass-to-Reason).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes