RO SY SY PLASM-PHMay 15

Dynamic Plasma Shape Control with Arbitrary Sensor Subsets

D. Sorokin, M. Stokolesov, A. Granovskiy, I. Prokofyev, E. Adishchev, M. Nurgaliev, E. Khayrutdinov, G. Subbotin, R. Clark, D. Orlov

arXiv:2605.1593516.6

AI Analysis

This work addresses the need for robust, real-time plasma shape control in tokamaks under sensor failures, offering a single policy that eliminates backup controllers.

A reinforcement learning agent was developed for tokamak plasma shape control that tracks dynamic shape targets and tolerates arbitrary sensor failures, achieving a mean shape error of 2.01 cm in simulation and transferring to experimental DIII-D shots.

Plasma shape control in tokamaks requires a real-time controller that tracks dynamically changing shape targets while tolerating diagnostic failures. Classical approaches decompose the problem into equilibrium reconstruction followed by a linear controller, and assume a fixed, fully operational sensor set. We present a reinforcement learning agent that addresses both limitations simultaneously. The agent is trained in NSFsim, a high-fidelity tokamak simulator configured for DIII-D, on a curated dataset of 120 experimental plasma shapes. The shape targets are resampled as random step changes every 0.25 s, exposing the agent to diverse transitions across the full shape envelope. At test time the agent zero-shot tracks dynamic shape sequences; on a held-out static configuration in simulation it achieves a mean shape error of 2.01 cm, and dynamic trajectory following is demonstrated qualitatively in simulation and on the physical device. Diagnostic dropout randomly masks 30% of magnetic sensors per episode, yielding a single policy robust to arbitrary sensor subsets without backup controllers or mode-switching logic. An asymmetric actor-critic architecture with privileged equilibrium information improves value estimation under partial observability; an auxiliary shape reconstruction head on the actor enables end-to-end shape reconstruction from raw diagnostics and serves as an interpretability tool for policy analysis. The policy transfers to experimental DIII-D shots, where it directly commands the coil actuators on two dynamic shape maneuvers, and to the independent GSevolve simulator.

View on arXiv PDF

Similar