PLASM-PHFeb 5Code
TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma ModelsCécile Rousseau, Samuel Jackson, Rodrigo H. Ordonez-Hurtado et al.
Development and operation of commercially viable fusion energy reactors such as tokamaks require accurate predictions of plasma dynamics from sparse, noisy, and incomplete sensors readings. The complexity of the underlying physics and the heterogeneity of experimental data pose formidable challenges for conventional numerical methods, while simultaneously highlight the promise of modern data-native AI approaches. A major obstacle in realizing this potential is, however, the lack of curated, openly available datasets and standardized benchmarks. Existing fusion datasets are scarce, fragmented across institutions, facility-specific, and inconsistently annotated, which limits reproducibility and prevents a fair and scalable comparison of AI approaches. In this paper, we introduce TokaMark, a structured benchmark to evaluate AI models on real experimental data collected from the Mega Ampere Spherical Tokamak (MAST). TokaMark provides a comprehensive suite of tools designed to (i) unify access to multi-modal heterogeneous fusion data, and (ii) harmonize formats, metadata, temporal alignment and evaluation protocols to enable consistent cross-model and cross-task comparisons. The benchmark includes a curated list of 14 tasks spanning a range of physical mechanisms, exploiting a variety of diagnostics and covering multiple operational use cases. A baseline model is provided to facilitate transparent comparison and validation within a unified framework. By establishing a unified benchmark for both the fusion and AI-for-science communities, TokaMark aims to accelerate progress in data-driven AI-based plasma modeling, contributing to the broader goal of achieving sustainable and stable fusion energy. The benchmark, documentation, and tooling will be fully open sourced upon acceptance to encourage community adoption and contribution.
PLASM-PHFeb 16Code
TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma DynamicsTobia Boschi, Andrea Loreti, Nicola C. Amorisco et al.
We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.
CRMay 16, 2019
Spatial Positioning Token (SPToken) for Smart MobilityRoman Overko, Rodrigo H. Ordonez-Hurtado, Sergiy Zhuk et al.
We introduce a permissioned distributed ledger technology (DLT) design for crowdsourced smart mobility applications. This architecture is based on a directed acyclic graph architecture (similar to the IOTA tangle) and uses both Proof-of-Work and Proof-of-Position mechanisms to provide protection against spam attacks and malevolent actors. In addition to enabling individuals to retain ownership of their data and to monetize it, the architecture also is suitable for distributed privacy-preserving machine learning algorithms, is lightweight, and can be implemented in simple internet-of-things (IoT) devices. To demonstrate its efficacy, we apply this framework to reinforcement learning settings where a third party is interested in acquiring information from agents. In particular, one may be interested in sampling an unknown vehicular traffic flow in a city, using a DLT-type architecture and without perturbing the density, with the idea of realizing a set of virtual tokens as surrogates of real vehicles to explore geographical areas of interest. These tokens, whose authenticated position determines write access to the ledger, are thus used to emulate the probing actions of commanded (real) vehicles on a given planned route by "jumping" from a passing-by vehicle to another to complete the planned trajectory. Consequently, the environment stays unaffected (i.e., the autonomy of participating vehicles is not influenced by the algorithm), regardless of the number of emitted tokens. The design of such a DLT architecture is presented, and numerical results from large-scale simulations are provided to validate the proposed approach.