AIDec 29, 2025
Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor ControlYoonpyo Lee, Kazuma Kobayashi, Sai Puppala et al.
The prevailing paradigm in AI for physical systems, scaling general-purpose foundation models toward universal multimodal reasoning, confronts a fundamental barrier at the control interface. Recent benchmarks show that even frontier vision-language models achieve only 50-53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility while violating physical constraints. This input unfaithfulness is not a scaling deficiency but a structural limitation. Perception-centric architectures optimize parameter-space imitation, whereas safety-critical control demands outcome-space guarantees over executed actions. Here, we present a fundamentally different pathway toward domain-specific foundation models by introducing compact language models operating as Agentic Physical AI, in which policy optimization is driven by physics-based validation rather than perceptual inference. We train a 360-million-parameter model on synthetic reactor control scenarios, scaling the dataset from 10^3 to 10^5 examples. This induces a sharp phase transition absent in general-purpose models. Small-scale systems exhibit high-variance imitation with catastrophic tail risk, while large-scale models undergo variance collapse exceeding 500x reduction, stabilizing execution-level behavior. Despite balanced exposure to four actuation families, the model autonomously rejects approximately 70% of the training distribution and concentrates 95% of runtime execution on a single-bank strategy. Learned representations transfer across distinct physics and continuous input modalities without architectural modification.
85.0CRApr 8
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI PipelinesTanzim Ahad, Ismail Hossain, Md Jahangir Alam et al.
We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.
LGDec 12, 2025
ReactorFold: Generative discovery of nuclear reactor cores via emergent physical reasoningYoonpyo Lee
Designing nuclear reactor cores requires navigating large discrete design spaces governed by complex neutronic interactions. Traditional deterministic, metaheuristic, and machine-learning-assisted methods search within fixed, human-defined configuration spaces, limiting their ability to discover fundamentally new design topologies. Here we introduce ReactorFold, a generative framework that reformulates fuel-assembly design as a sequence modeling problem for language models. Using Monte Carlo data, parameter-efficient fine-tuning, and Direct Preference Optimization (DPO), the model learns the latent structure of a pressurized-water-reactor assembly and generates candidate layouts in a single forward pass. Notably, the DPO-aligned model exhibits emergent design-space expansion: despite being trained exclusively on configurations with a fixed number of gadolinium burnable absorber (Gd) rods, it autonomously adjusts Gd inventory to satisfy strict power-peaking constraints. The model also discovers high-performing asymmetric configurations that challenge conventional symmetric loading heuristics, accessing design regimes inaccessible to conventional search methods and demonstrating that language models can internalize causal physical relationships and transcend human-imposed design constraints.
76.8LGApr 8
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard ModelsIsmail Hossain, Sai Puppala, Jannatul Ferdaus et al.
A guard model fine-tuned on entirely benign data can lose all safety alignment -- not through adversarial manipulation, but through standard domain specialization. We demonstrate this failure across three purpose-built safety classifiers -- LlamaGuard, WildGuard, and Granite Guardian -- deployed as protection layers in agentic AI pipelines, and show that it originates in the destruction of latent safety geometry: the structured harmful -- benign representational boundary that guides classification. We extract per-layer safety subspaces via SVD on class-conditional activation differences and track how this boundary evolves under benign fine-tuning. Granite Guardian undergoes complete collapse -- refusal rate drops from 85\% to 0\%, CKA falls to zero, and 100\% of outputs become ambiguous -- a severity exceeding prior findings on general-purpose LLMs, explained by the specialization hypothesis: concentrated safety representations are efficient but catastrophically brittle. To mitigate this, we propose Fisher-Weighted Safety Subspace Regularization (FW-SSR), a training-time penalty combining (i) curvature-aware direction weights derived from diagonal Fisher information and (ii) an adaptive $λ_t$ that scales with task-safety gradient conflict. FW-SSR recovers 75\% refusal on Granite Guardian (CKA = 0.983) and reduces WildGuard's Attack Success Rate to 3.6\% -- below the unmodified baseline -- by actively sharpening the safety subspace rather than merely anchoring it. Across all three models, structural representational geometry (CKA, Fisher score) predicts safety behavior more reliably than absolute displacement metrics, establishing geometry-based monitoring as a necessary component of guard model evaluation in agentic deployments.