CLJan 29

Temporal Guidance for Large Language Models

arXiv:2601.21744v1
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in LLM decoding for researchers and practitioners, offering an incremental improvement over existing contrastive methods.

The paper tackles the computational overhead of contrastive decoding in large language models by proposing Temporal Guidance (TeGu), a method that uses multi-token prediction for self-contrast, achieving significant performance improvements with low additional resource costs across various models and benchmarks.

Contrastive Decoding (CD) enhances the generation quality of large language models (LLMs) but incurs significant additional computational overhead due to the need for an auxiliary model. Existing internal self-contrastive decoding methods, such as Decoding by Contrasting Layers (DoLa), focus on discrepancies across different layers, which are notably unstable on small-scale models. In this work, based on the observation that LLMs exhibit local preferences, we propose a novel contrastive guidance strategy along the temporal dimension, namely Temporal Guidance (TeGu). Our method ingeniously leverages Multi-Token Prediction (MTP) to construct weaker amateur predictions for model self-contrast. To standardize the implementation of this mechanism, we further introduce a lightweight Conditional MTP Projector (cMTPP), which avoids maintaining multiple independent networks as required by other MTP modules. Across various model series and benchmarks, TeGu achieves significant performance improvements while maintaining low additional memory consumption and computational overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes