CLLGNov 23, 2024

Improving Next Tokens via Second-to-Last Predictions with Generate and Refine

arXiv:2411.15661v21 citationsh-index: 1IDA
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in language modeling for researchers and practitioners, offering incremental improvements in prediction accuracy.

The paper tackles the problem of improving next token predictions in autoregressive language models by training a decoder-only architecture to predict second-to-last tokens, which achieves over 15% higher accuracy than standard next token predictions. It uses a generate-then-refine approach to combine these predictions, resulting in consistent and significant gains in next-token prediction accuracy.

Autoregressive language models like GPT aim to predict next tokens, while autoencoding models such as BERT are trained on tasks such as predicting masked tokens. We train a decoder-only architecture for predicting the second to last token for a sequence of tokens. Our approach yields higher computational training efficiency than BERT-style models by employing a structured deterministic approach to masking tokens. We use our model to improve the next token predictions of a standard GPT by combining both predictions in a ``generate-then-refine'' approach. We demonstrate on different variants of GPT-2 and different datasets that (not unexpectedly) second to last token predictions are much more accurate, i.e., more than 15\% higher accuracy than standard next token predictions. The ``generate-then-refine'' approach also demonstrates notable improvements in next-token predictions, yielding smaller yet consistent and significant gains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes