LGApr 14
Agentic Control in Variational Language ModelsYves Ruffenach
We study whether a variational language model can support a minimal and measurable form of agentic control grounded in its own internal evidence. Our model combines local variational hidden computation (EVE), a homeostatic latent regulator, structurally aware checkpoint retention and a calibrated uncertainty-aware controller operating on top of the retained model. Rather than treating uncertainty as a passive diagnostic measured after prediction, we treat it as an operational signal that can regulate training, support checkpoint retention and guide inference-time intervention. The resulting framework is deliberately focused. It studies a closed-loop form of internal control in which structural and predictive signals become actionable. Empirically, the variational backbone improves over a matched deterministic reference on the language-modeling task while also exhibiting a richer and more usable uncertainty profile. On top of this backbone, the calibrated controller remains active, uses multiple actions under a full agentic evaluation and yields a positive quality-cost trade-off. These results support a precise claim: internal uncertainty can serve not only as a descriptive property of a variational language model, but also as a practical control interface for regulation, checkpoint retention and minimal agentic routing.
LGMar 30
Variational Neurons in Transformers for Language ModelingYves Ruffenach
Transformers for language modeling usually rely on deterministic internal computation, with uncertainty expressed mainly at the output layer. We introduce variational neurons into Transformer feed-forward computation so that uncertainty becomes part of the internal computation itself. Concretely, we replace deterministic feed-forward units with local variational units based on EVE while preserving the overall Transformer backbone. We evaluate this design in compact next-token language-modeling settings. We compare deterministic and variational variants with both predictive and probabilistic criteria. Alongside negative log-likelihood, perplexity and accuracy, we analyze calibration, conditional variance, mutual information and latent-usage statistics. The resulting picture is clear. Variational neurons integrate stably into Transformers, preserve strong predictive performance and produce informative uncertainty signals. The experiments also show that task quality, useful depth and internal stability are distinct properties. These results establish variational Transformers as a practical form of uncertainty-aware language modeling. They show that Transformers can predict with an explicit internal structure of uncertainty, which supports stronger probabilistic evaluation and a more informative analysis of model behavior.
LGMar 14
Exploring the Dimensions of a Variational NeuronYves Ruffenach
We introduce EVE (Elemental Variational Expanse), a variational distributional neuron formulated as a local probabilistic computational unit with an explicit prior, an amortized posterior, and unit-level variational regularization. In most modern architectures, uncertainty is modeled through global latent variables or parameter uncertainty, while the computational unit itself remains scalar. EVE instead relocates probabilistic structure to the neuron level, making it locally observable and controllable. In this paper, the term dimensions refers primarily to the neuron's internal latent dimensionality, denoted by k. We study how varying k, from the atomic case k = 1 to higher-dimensional latent spaces, changes the neuron's learned operating regime. We then examine how this main axis interacts with two additional structural properties: local capacity control and temporal persistence through a neuron-level autoregressive extension. To support this study, EVE is instrumented with internal diagnostics and constraints, including effective KL, a target band on mu^2, out-of-band fractions, and indicators of drift and collapse. Across selected forecasting and tabular settings, we show that latent dimensionality, control, and temporal extension shape the neuron's internal regime, and that some neuron-level variables are measurable, informative, and related to downstream behavior. Overall, the paper provides an experimentally grounded first map of the design space opened by a variational neuron.
LGFeb 20
Variational Distributional NeuronYves Ruffenach
We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This "contraction" becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze "collapse" modes and the conditions for a "living neuron", then extend the contribution over time via autoregressive priors over the latent, per unit.
LGDec 10, 2025
Latent-Autoregressive GP-VAE Language ModelYves Ruffenach
We investigate a fully Latent AutoRegressive scheme based on a Gaussian Process (GP) integrated into a Variational Autoencoder (VAE). In this setting, sequential dynamics are transferred from the observation space to a continuous latent space, while linguistic generation remains parallel through a non-autoregressive decoder. We present a complete methodological formulation, including a causal GP prior, a structured amortized posterior, and a training protocol based on a regularized ELBO. Empirical evaluation, conducted within a deliberately constrained proof-of-concept (POC) framework, shows that the model can be trained stably and that the sequential and parallel sampling variants exhibit consistent behavior. Overall, the results suggest that part of the temporal structure in a language model can be supported by the probabilistic geometry of the latent space rather than by explicit neural operations.
LGDec 30, 2025
Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation StudyYves Ruffenach
This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an autoregressive factorization over tokens. In contrast, our prior work proposed shifting sequential structure to the latent space through a causal Gaussian process, while using a non-autoregressive decoder. Here, we conduct a systematic ablation study of the role played by latent autoregression. We compare (i) a full GP-VAE model with autoregressive latent dynamics, (ii) a non-autoregressive ablation in which latent variables are independent, and (iii) a standard token-level autoregressive Transformer. Our results show that, within the considered regime (medium-scale corpora and short training contexts), latent autoregression induces latent trajectories that are significantly more compatible with the Gaussian-process prior and exhibit greater long-horizon stability. In contrast, removing autoregression leads to degraded latent structure and unstable long-range behavior. These findings highlight the role of latent autoregression as an effective mechanism for organizing long-range structure, while remaining complementary to token-level autoregressive modeling. They should be interpreted as an empirical analysis of representational structure rather than as a proposal for a new architecture.