SYDec 15, 2016
Output Feedback Controller Design with Symbolic Observers for Cyber-physical SystemsMasashi Mizoguchi, Toshimitsu Ushio
In this paper, we design a symbolic output feedback controller of a cyber-physical system (CPS). The physical plant is modeled by an infinite transition system. We consider the situation that a finite abstracted system of the physical plant, called a c-abstracted system, is given. There exists an approximate alternating simulation relation from the c-abstracted system to the physical plant. A desired behavior of the c-abstracted system is also given, and we have a symbolic state feedback controller of the physical plant. We consider the case where some states of the plant are not measured. Then, to estimate the states with abstracted outputs measured by sensors, we introduce a finite abstracted system of the physical plant, called an o-abstracted system, such that there exists an approximate simulation relation. The relation guarantees that an observer designed based on the state of the o-abstracted system estimates the current state of the plant. We construct a symbolic output feedback controller by composing these systems. By a relation-based approach, we proved that the controlled system approximately exhibits the desired behavior.
MLJan 21, 2022
Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxationJunya Ikemoto, Toshimitsu Ushio
Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within bounded time intervals. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a $τ$-CMDP. We formulate the STL-constrained optimal control problem as the $τ$-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.
SYAug 3, 2021
Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic SpecificationsJunya Ikemoto, Toshimitsu Ushio
We apply deep reinforcement learning (DRL) to design of a networked controller with network delays to complete a temporal control task that is described by a signal temporal logic (STL) formula. STL is useful to deal with a specification with a bounded time interval for a dynamical system. In general, an agent needs not only the current system state but also the past behavior of the system to determine a desired control action for satisfying the given STL formula. Additionally, we need to consider the effect of network delays for data transmissions. Thus, we propose an extended Markov decision process using past system states and control actions, which is called a $τd$-MDP, so that the agent can evaluate the satisfaction of the STL formula considering the network delays. Thereafter, we apply a DRL algorithm to design a networked controller using the $τd$-MDP. Through simulations, we also demonstrate the learning performance of the proposed algorithm.
GTApr 17, 2021
Stability analysis and control of decision-making of miners in blockchainKosuke Toda, Naomi Kuze, Toshimitsu Ushio
To maintain blockchain-based services with ensuring its security, it is an important issue how to decide a mining reward so that the number of miners participating in the mining increases. We propose a dynamical model of decision-making for miners using an evolutionary game approach and analyze the stability of equilibrium points of the proposed model. The proposed model is described by the 1st-order differential equation. So, it is simple but its theoretical analysis gives an insight into the characteristics of the decision-making. Through the analysis of the equilibrium points, we show the transcritical bifurcations and hysteresis phenomena of the equilibrium points. We also design a controller that determines the mining reward based on the number of participating miners to stabilize the state that all miners participate in the mining. Numerical simulation shows that there is a trade-off in the choice of the design parameters.
LGJan 13, 2021
Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time SystemsJunya Ikemoto, Toshimitsu Ushio
Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system parameter vector. We can collect many experiences more efficiently than interactions with the real system. However, it is difficult to identify the system parameter vector accurately. If we have an identification error, experiences obtained by the simulator may degrade the performance of the learned policy. Thus, we propose a practical RL algorithm that consists of two stages. At the first stage, we choose multiple system parameter vectors. Then, we have a mathematical model for each system parameter vector, which is called a virtual system. We obtain optimal Q-functions for multiple virtual systems using the continuous deep Q-learning algorithm. At the second stage, we represent a Q-function for the real system by a linear approximated function whose basis functions are optimal Q-functions learned at the first stage. The agent learns the Q-function through interactions with the real system online. By numerical simulations, we show the usefulness of our proposed method.
GTOct 11, 2020
Game-theoric approach to decision-making problem for blockchain miningKosuke Toda, Naomi Kuze, Toshimitsu Ushio
It is an important decision-making problem for a miner in the blockchain networks if he/she participates in the mining so that he/she earns a reward by creating a new block earlier than other miners. We formulate this decision-making problem as a noncooperative game, because the probability of creating a block depends not only on one's own available computational resources, but also those of other miners. Through theoretical and numerical analyses, we show a hysteresis phenomenon of Nash equilibria depending on the reward and a jump phenomenon of miner decisions by a slight change in reward. We also show that the reward for which miners decide not to participate in the mining becomes smaller as the number of miners increases.
SYJan 14, 2020
Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi AutomataRyohei Oura, Ami Sakakibara, Toshimitsu Ushio
This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.
LGAug 28, 2019
Networked Control of Nonlinear Systems under Partial Observation Using Continuous Deep Q-LearningJunya Ikemoto, Toshimitsu Ushio
In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not known but the maximum value of fluctuating network delays is known beforehand. Moreover, we also assume that the sensor cannot observe all state variables of the plant. Under these assumption, we apply continuous deep Q-learning to the design of the networked controller. Then, we introduce an extended state consisting of a sequence of past control inputs and outputs as inputs to the deep neural network. By simulation, it is shown that, using the extended state, the controller can learn a control policy robust to the fluctuation of the network delays under the partial observation.
SYJul 16, 2019
Model-free Control of Chaos with Continuous Deep Q-learningJunya Ikemoto, Toshimitsu Ushio
The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. To overcome this problem, we propose a model-free reinforcement learning algorithm to the design of a controller for the chaotic system. In recent years, model-free reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. However, it is known that model-free reinforcement learning algorithms are not efficient because learners must explore their control policies over the entire state space. Moreover, model-free reinforcement learning algorithms with deep neural networks have the disadvantage in taking much time to learn their control optimal policies. Thus, we propose a data-based control policy consisting of two steps, where we determine a region including the stabilizing periodic orbit first, and make the controller learn an optimal control policy for its stabilization. In the proposed method, the controller efficiently explores its control policy only in the region.