LG AI CLMar 24, 2025

Efficient Joint Prediction of Multiple Future Tokens

arXiv:2503.21801v117.98 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses the challenge of improving predictive representations in language models, though it appears incremental with preliminary results on a synthetic task.

The authors tackled the problem of enriching hidden state representations in language models by introducing joint multi-token prediction (JTP), a lightweight modification of standard next-token prediction that jointly predicts multiple future tokens with minimal computational overhead. They demonstrated significant performance improvement over existing methods on a synthetic star graph navigation task.

In this short report, we introduce joint multi-token prediction (JTP), a lightweight modification of standard next-token prediction designed to enrich hidden state representations by jointly predicting multiple future tokens. Unlike previous multi-token prediction approaches, JTP strategically employs teacher forcing of future-tokens through a carefully designed representation bottleneck, allowing the model to encode rich predictive information with minimal computational overhead during training. We show that the JTP approach achieves a short-horizon belief state representation, while popular alternatives for multi-token prediction fail to do so. We demonstrate the effectiveness of our method on the synthetic star graph navigation task from from Bachmann and Nagarajan [2024], highlighting a significant performance improvement over existing methods. This manuscript presents promising preliminary results intended to stimulate further research.

View on arXiv PDF

Similar