Junya Ikemoto

9papers

27citations

Novelty51%

AI Score49

Ranked #47,630 of 201,326 authors (top 24%)#195 in SY (top 19%)

9 Papers

SYMay 13

Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters

Junya Ikemoto

This paper proposes a simulation-based reinforcement learning algorithm for controlling systems with uncertain and varying system parameters. While simulators are useful for safely learning control policies, the reality gap remains a major challenge. To alleviate this challenge, we propose a two-stage algorithm. First, multiple control policies are learned for systems with different system parameters in a simulator. Second, for a real system, the control policies are adaptively switched using an online convex optimization algorithm based on observations. This approach is expected to reduce learning complexity compared with existing approaches that rely on a single policy to address the reality gap.

SYApr 3

Data-Driven Synthesis of Probabilistic Controlled Invariant Sets for Linear MDPs

Kazumune Hashimoto, Shunki Kimura, Kazunobu Serizawa et al.

We study data-driven computation of probabilistic controlled invariant sets (PCIS) for safety-critical reinforcement learning under unknown dynamics. Assuming a linear MDP model, we use regularized least squares and self-normalized confidence bounds to construct a conservative estimate of the states from which the system can be kept inside a prescribed safe region over an $N$-step horizon, together with the corresponding set-valued safe action map. This construction is obtained through a backward recursion and can be interpreted as a conservative approximation of the $N$-step safety predecessor operator. When the associated conservative-inclusion event holds, a conservative fixed point of the approximate recursion can be certified as an $(N,Îµ)$-PCIS with confidence at least $Î·$. For continuous state spaces, we introduce a lattice abstraction and a Lipschitz-based discretization error bound to obtain a tractable approximation scheme. Finally, we use the resulting conservative fixed-point approximation as a runtime candidate PCIS in a practical shielding architecture with iterative updates, and illustrate the approach on a numerical experiment.

CLMar 30

Structural-Ambiguity-Aware Translation from Natural Language to Signal Temporal Logic

Kosei Fushimi, Kazunobu Serizawa, Junya Ikemoto et al.

Signal Temporal Logic (STL) is widely used to specify timed and safety-critical tasks for cyber-physical systems, but writing STL formulas directly is difficult for non-expert users. Natural language (NL) provides a convenient interface, yet its inherent structural ambiguity makes one-to-one translation into STL unreliable. In this paper, we propose an \textit{ambiguity-preserving} method for translating NL task descriptions into STL candidate formulas. The key idea is to retain multiple plausible syntactic analyses instead of forcing a single interpretation at the parsing stage. To this end, we develop a three-stage pipeline based on Combinatory Categorial Grammar (CCG): ambiguity-preserving $n$-best parsing, STL-oriented template-based semantic composition, and canonicalization with score aggregation. The proposed method outputs a deduplicated set of STL candidates with plausibility scores, thereby explicitly representing multiple possible formal interpretations of an ambiguous instruction. In contrast to existing one-best NL-to-logic translation methods, the proposed approach is designed to preserve attachment and scope ambiguity. Case studies on representative task descriptions demonstrate that the method generates multiple STL candidates for genuinely ambiguous inputs while collapsing unambiguous or canonically equivalent derivations to a single STL formula.

SYApr 28

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems

Junya Ikemoto, Satoshi Maruyama, Kazumune Hashimoto

This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.

MLJan 21, 2022

Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation