Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts
This addresses robustness issues in controller synthesis for systems like air traffic control, though it is incremental as it builds on existing RL methods.
The paper tackles anisotropic generalization in reinforcement learning for on-the-fly directed controller synthesis, where policies perform well only in specific regions, and proposes a Soft Mixture-of-Experts framework that expands solvable parameter space and improves robustness on the Air Traffic benchmark.
On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.