Heng-Sheng Chang

LG
h-index6
9papers
25citations
Novelty48%
AI Score48

9 Papers

LGMay 15
Transformer-like Inference from Optimal Control

Aditya Kudre, Heng-Sheng Chang, Prashant G. Mehta

Decoder-only transformers compute the conditional probability of the next token from a sequence of past observations. This paper derives, from first principles, inference architectures that solve the same prediction problem - and in doing so, recovers transformer-like layer operations as a consequence of optimal control theory. The framework is developed for two model classes: a nonlinear model of discrete-valued processes, directly motivated by the transformer, and a linear Gaussian model as a tractable baseline. For both model classes, the prediction objective is reformulated as an optimal control problem whose solution yields an explicit inference algorithm, the dual filter, with a layer structure that mirrors the layer structure of a decoder-only transformer. Numerical experiments provide a comparison of the optimal control to attention weights from a trained transformer. These experiments reveal that when the embedding dimension is insufficient, the transformer implicitly exploits non-Markovian structure.

SYApr 5
Duality Theory for Non-Markovian Linear Gaussian Models

Aditya Kudre, Heng-Sheng Chang, Prashant G. Mehta

This work develops a duality theory for partially observed linear Gaussian models in discrete time. The state process evolves according to a causal but non-Markovian (or higher-order Gauss-Markov) structure, captured by a lower-triangular transition operator, which is related to transformer, with $T$ as the context length. The main contributions are: (i) a dual control system for the linear Gaussian model, formulated as a backward difference equation (B $Δ$ E); (ii) a duality principle establishing that a specific linear-quadratic optimal control problem for the B $Δ$ E is dual to the filtering problem for the partially observed model; and (iii) an explicit optimal control formula yielding a novel (transformer-like) linear predictor, referred to as the dual filter, whose computational complexity scales linearly in the time horizon $T$, in contrast to the $O(T^3)$ cost of classical smoothing and Wiener-Hopf approaches.

LGNov 13, 2025
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations

Reginald Zhiyan Chen, Heng-Sheng Chang, Prashant G. Mehta

Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch (EM) algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a novel framework that learns HMM parameters through gradient-based optimization by formulating the HMM's forward filter as a structured neural network. Unlike black-box Transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves superior convergence speed compared to Baum-Welch, successfully recovering parameters in both undercomplete and overcomplete settings where spectral methods fail. Comparisons with Transformer-based models are also presented on real-world language data.

LGMay 1, 2025
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures

Heng-Sheng Chang, Prashant G. Mehta

This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our objective is not to construct a mathematical model of a transformer. Rather, our interest lies in deriving, from first principles, transformer-like architectures that solve the prediction problem for which the transformer is designed. The proposed framework is based on an original optimal control approach, where the prediction objective (MMSE) is reformulated as an optimal control problem. An analysis of the optimal control problem is presented leading to a fixed-point equation on the space of probability measures. To solve the fixed-point equation, we introduce the dual filter, an iterative algorithm that closely parallels the architecture of decoder-only transformers. These parallels are discussed in detail along with the relationship to prior work on mathematical modeling of transformers as transport on the space of probability measures. Numerical experiments are provided to illustrate the performance of the algorithm using parameter values used in researchscale transformer models.

LGAug 27, 2025
What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture

Heng-Sheng Chang, Prashant G. Mehta

In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.

ROSep 17, 2021
A physics-informed, vision-based method to reconstruct all deformation modes in slender bodies

Seung Hyun Kim, Heng-Sheng Chang, Chia-Hsien Shih et al.

This paper is concerned with the problem of estimating (interpolating and smoothing) the shape (pose and the six modes of deformation) of a slender flexible body from multiple camera measurements. This problem is important in both biology, where slender, soft, and elastic structures are ubiquitously encountered across species, and in engineering, particularly in the area of soft robotics. The proposed mathematical formulation for shape estimation is physics-informed, based on the use of the special Cosserat rod theory whose equations encode slender body mechanics in the presence of bending, shearing, twisting and stretching. The approach is used to derive numerical algorithms which are experimentally demonstrated for fiber reinforced and cable-driven soft robot arms. These experimental demonstrations show that the methodology is accurate (<5 mm error, three times less than the arm diameter) and robust to noise and uncertainties.

OCOct 2, 2020
Optimal Control of a Soft CyberOctopus Arm

Tixian Wang, Udit Halder, Heng-Sheng Chang et al.

In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is modeled as a Hamiltonian control system, where the continuum mechanics of the arm is modeled after the Cosserat rod theory, and internal, distributed muscle forces and couples are considered as controls. First order necessary optimality conditions are derived for an optimal control problem formulated for this infinite dimensional system. Solutions to this problem are obtained numerically by an iterative forward-backward algorithm. The state and adjoint equations are solved in a dynamic simulation environment, setting the stage for studying a broader class of optimal control problems. Trajectories that minimize control effort are demonstrated and qualitatively compared with observed behaviors.

ROOct 2, 2020
Controlling a CyberOctopus Soft Arm with Muscle-like Actuation

Heng-Sheng Chang, Udit Halder, Ekaterina Gribkova et al.

This paper presents an application of the energy shaping methodology to control a flexible, elastic Cosserat rod model of a single octopus arm. The novel contributions of this work are two-fold: (i) a control-oriented modeling of the anatomically realistic internal muscular architecture of an octopus arm; and (ii) the integration of these muscle models into the energy shaping control methodology. The control-oriented modeling takes inspiration in equal parts from theories of nonlinear elasticity and energy shaping control. By introducing a stored energy function for muscles, the difficulties associated with explicitly solving the matching conditions of the energy shaping methodology are avoided. The overall control design problem is posed as a bilevel optimization problem. Its solution is obtained through iterative algorithms. The methodology is numerically implemented and demonstrated in a full-scale dynamic simulation environment Elastica. Two bio-inspired numerical experiments involving the control of octopus arms are reported.

SYApr 13, 2020
Energy Shaping Control of a CyberOctopus Soft Arm

Heng-Sheng Chang, Udit Halder, Chia-Hsien Shih et al.

This paper entails application of the energy shaping methodology to control a flexible, elastic Cosserat rod model. Recent interest in such continuum models stems from applications in soft robotics, and from the growing recognition of the role of mechanics and embodiment in biological control strategies: octopuses are often regarded as iconic examples of this interplay. Here, the dynamics of the Cosserat rod, modeling a single octopus arm, are treated as a Hamiltonian system and the internal muscle actuators are modeled as distributed forces and couples. The proposed energy shaping control design procedure involves two steps: (1) a potential energy is designed such that its minimizer is the desired equilibrium configuration; (2) an energy shaping control law is implemented to reach the desired equilibrium. By interpreting the controlled Hamiltonian as a Lyapunov function, asymptotic stability of the equilibrium configuration is deduced. The energy shaping control law is shown to require only the deformations of the equilibrium configuration. A forward-backward algorithm is proposed to compute these deformations in an online iterative manner. The overall control design methodology is implemented and demonstrated in a dynamic simulation environment. Results of several bio-inspired numerical experiments involving the control of octopus arms are reported.