LGAICLMay 14

Dynamic Latent Routing

arXiv:2605.1432364.5
Predicted impact top 31% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners fine-tuning language models with limited data, DLR offers a method that outperforms standard supervised fine-tuning, while prior discrete-latent baselines underperform.

The paper introduces Dynamic Latent Routing (DLR), a post-training method for language models that jointly learns discrete latent codes, routing policies, and model parameters via dynamic search. In low-data fine-tuning, DLR matches or outperforms supervised fine-tuning across four datasets and six models, achieving a mean gain of +6.6 percentage points.

We investigate the temporal concatenation of sub-policies in Markov Decision Processes (MDP) with time-varying reward functions. We introduce General Dijkstra Search (GDS), and prove that globally optimal goal-reaching policies can be recovered through temporal composition of intermediate optimal sub-policies. Motivated by the "search, select, update" principle underlying GDS, we propose Dynamic Latent Routing (DLR), a language-model post-training method that jointly learns discrete latent codes, routing policies, and model parameters through dynamic search in a single training stage. In low-data fine-tuning settings, DLR matches or outperforms supervised fine-tuning across four datasets and six models, achieving a mean gain of +6.6 percentage points, while prior discrete-latent baselines consistently underperform SFT. Mechanistic analyses and targeted code ablations show that DLR learns structured routing behaviors with distinct causal roles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes