CRApr 28

Large Language Models as Explainable Cyberattack Detectors for Energy Industrial Control Systems

Weiyi Kong, Ahmad Mohammad Saber, Amr Youssef, Deepa Kundur

arXiv:2604.2607953.4

Predicted impact top 34% in CR · last 90 daysOriginality Synthesis-oriented

AI Analysis

For operators of energy industrial control systems, this work provides a complementary, human-in-the-loop intrusion detection method that outputs audit records, though the approach is incremental as it applies existing LLM technology to a known problem.

The paper demonstrates that an off-the-shelf LLM can serve as an explainable cyberattack detector for energy ICS, achieving high predictive performance comparable to supervised baselines on two public Modbus datasets without task-specific weight updates.

In modern energy systems, industrial control systems (ICS) and power-system SCADA require intrusion detection that is not only accurate but also auditable by operators. The ICS intrusion-detection landscape is currently dominated by established supervised detectors. In this paper, we study whether an off-the-shelf large language model (LLM) can serve as a complementary, human-in-the-loop layer for Modbus traffic. We cast this as a binary network-side normal/critical decision task on two public ICS Modbus datasets, collapsing attack periods and other safety-critical behaviors into a single critical class. Each Modbus communication instance is converted into a compact token string derived from discretized protocol fields, and a prompt-configured LLM produces a normal/critical alert together with a concise, token-grounded incident record for analyst review. Under matched event information and shared evaluation splits, the resulting LLM-based triage pipeline achieves high predictive performance on both benchmarks and is broadly comparable to strong supervised baselines, while requiring no task-specific weight updates. To assess the audit record, we apply intervention-based diagnostics, including sufficiency- and necessity-style tests, which provide evidence that the cited tokens are often decision-relevant to the model's own prediction. These records are intended as audit signals rather than full human-grounded explanations.

View on arXiv PDF

Similar