Yohei Hosoe

LG
h-index9
5papers
38citations
Novelty62%
AI Score35

5 Papers

LGApr 18, 2023Code
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

Kazumi Kasaura, Shuwa Miura, Tadashi Kozuno et al.

This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.

SYFeb 28, 2019
Equivalent Stability Notions, Lyapunov Inequality, and Its Application in Discrete-Time Linear Systems with Stochastic Dynamics Determined by an i.i.d. Process

Yohei Hosoe, Tomomichi Hagiwara

This paper is concerned with stability analysis and synthesis for discrete-time linear systems with stochastic dynamics. Equivalence is first proved for three stability notions under some key assumptions on the randomness behind the systems. In particular, we use the assumption that the stochastic process determining the system dynamics is independent and identically distributed (i.i.d.) with respect to the discrete time. Then, a Lyapunov inequality condition is derived for stability in a necessary and sufficient sense. Although our Lyapunov inequality will involve decision variables contained in the expectation operation, an idea is provided to solve it as a standard linear matrix inequality; the idea also plays an important role in state feedback synthesis based on the Lyapunov inequality. Motivating numerical examples are further discussed as an application of our approach.

LGAug 29, 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai et al.

Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a bisection search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$-optimal policy in an RCMDP with $\tilde{\mathcal{O}}(\varepsilon^{-4})$ robust policy evaluations.

LGFeb 14, 2025
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno et al.

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves $\tilde{\mathcal{O}}(\sqrt{K})$ regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.

SYApr 10, 2019
Distribution Modeling and Stabilization Control for Discrete-Time Linear Random Dynamical Systems Using Ensemble Kalman Filter

Yohei Hosoe, Dimitri Peaucelle

This paper studies an output feedback stabilization control framework for discrete-time linear systems with stochastic dynamics determined by an independent and identically distributed (i.i.d.) process. The controller is constructed with an ensemble Kalman filter (EnKF) and a feedback gain designed with our earlier result about state feedback control. The EnKF is also used for modeling the distribution behind the system, which is required in the feedback gain synthesis. The effectiveness of our control framework is demonstrated with numerical experiments. This study will become the first step toward the realization of learning type control using our stochastic systems control theory.