Token Maturation: Autoregressive Language Generation via Continuous Token Dynamics
This addresses a fundamental limitation in autoregressive language generation for NLP applications, offering a novel alternative to standard methods.
The paper tackles the problem of premature token discretization in autoregressive language models, which causes degeneration issues like repetition loops, by introducing Token Maturation—a continuous framework where tokens evolve as vector trajectories before discretization. The result is that it generates coherent and diverse text under fully deterministic decoding without needing heuristic strategies like repetition penalties or temperature scaling.
Standard autoregressive language models collapse uncertainty at every generation step by committing to discrete tokens through immediate sampling. This premature discretization underlies well-known failure modes, including degenerate repetition loops in greedy decoding and a heavy reliance on heuristic sampling strategies. We introduce \textbf{Token Maturation}, a continuous autoregressive framework in which tokens evolve as vector-valued trajectories prior to discretization. Rather than sampling from a categorical distribution at each step, the model resolves uncertainty through a deterministic dynamical process in embedding space, deferring discrete commitment until the representation has geometrically stabilized. We show that this formulation mitigates degeneration \emph{intrinsically}: Token Maturation generates coherent and diverse text under fully deterministic decoding (argmax), without repetition penalties, temperature scaling, or stochastic sampling. Moreover, we identify a novel convergence behavior in which token representations stabilize spatially while predictive entropy remains high, challenging the common assumption that commitment requires probability concentration. We propose continuous token dynamics with delayed commitment as an alternative formulation of autoregressive generation that exposes structural regularities obscured by immediate discretization.