Causally-Guided Pairwise Transformer -- Towards Foundational Digital Twins in Process Industry
This addresses the problem of robust and adaptable foundational modelling for digital twins in the process industry, offering a flexible architecture that scales and adapts to any number of variables.
The paper tackled the trade-off between channel-dependent and channel-independent models for multi-dimensional time-series data in industrial systems by proposing the Causally-Guided Pairwise Transformer (CGPT), which integrates a causal graph as an inductive bias and decomposes data into pairs, resulting in significant outperformance over baselines in predictive accuracy on synthetic and real-world datasets.
Foundational modelling of multi-dimensional time-series data in industrial systems presents a central trade-off: channel-dependent (CD) models capture specific cross-variable dynamics but lack robustness and adaptability as model layers are commonly bound to the data dimensionality of the tackled use-case, while channel-independent (CI) models offer generality at the cost of modelling the explicit interactions crucial for system-level predictive regression tasks. To resolve this, we propose the Causally-Guided Pairwise Transformer (CGPT), a novel architecture that integrates a known causal graph as an inductive bias. The core of CGPT is built around a pairwise modeling paradigm, tackling the CD/CI conflict by decomposing the multidimensional data into pairs. The model uses channel-agnostic learnable layers where all parameter dimensions are independent of the number of variables. CGPT enforces a CD information flow at the pair-level and CI-like generalization across pairs. This approach disentangles complex system dynamics and results in a highly flexible architecture that ensures scalability and any-variate adaptability. We validate CGPT on a suite of synthetic and real-world industrial datasets on long-term and one-step forecasting tasks designed to simulate common industrial complexities. Results demonstrate that CGPT significantly outperforms both CI and CD baselines in predictive accuracy and shows competitive performance with end-to-end trained CD models while remaining agnostic to the problem dimensionality.