LGApr 6

The Role of Generator Access in Autoregressive Post-Training

arXiv:2604.0485577.9

Predicted impact top 19% in LG · last 90 daysOriginality Highly original

AI Analysis

This work addresses a fundamental constraint in machine learning for autoregressive models, with potential implications for improving training efficiency and performance in language generation tasks.

The paper investigates how different levels of access to a generator affect autoregressive post-training, finding that weak prefix control over previously built prefixes can break limitations of root-start rollouts and create an exponential gap for KL-regularized outcome-reward post-training.

We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top-$k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top-$1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

View on arXiv PDF

Similar