SYJul 22, 2019
Categorization Problem on Controllability of Boolean Control NetworksQunxi Zhu, Zuguang Gao, Yang Liu et al.
A Boolean control network (BCN) is a discrete-time dynamical system whose variables take values from a binary set $\{0,1\}$. At each time step, each variable of the BCN updates its value simultaneously according to a Boolean function which takes the state and control of the previous time step as its input. Given an ordered pair of states of a BCN, we define the set of reachable time steps as the set of positive integer $k$'s where there exists a control sequence such that the BCN can be steered from one state to the other in exactly $k$ time steps; and the set of unreachable time steps as the set of $k$'s where there does not exist any control sequences such that the BCN can be steered from one state to the other in exactly $k$ time steps. We consider in this paper the so-called categorization problem of a BCN, i.e., we develop a method, via algebraic graph theoretic approach, to determine whether the set of reachable time steps and the set of unreachable time steps, associated with the given pair of states, are finite or infinite. Our results can be applied to classify all ordered pairs of states into four categories, depending on whether the set of reachable (unreachable) time steps is finite or not.
GTDec 15, 2021
Finite-Sample Analysis of Decentralized Q-Learning for Stochastic GamesZuguang Gao, Qianqian Ma, Tamer Başar et al.
Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class of general-sum stochastic games (SGs) - weakly acyclic SGs, which includes the common cooperative MARL setting with an identical reward to all agents (a Markov team problem) as a special case. We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent. In fact, each agent is completely oblivious to the presence of other decision makers. Both the tabular and the linear function approximation cases have been considered. In the tabular setting, we analyze the sample complexity for the decentralized Q-learning algorithm to converge to a Markov perfect equilibrium (Nash equilibrium). With linear function approximation, the results are for convergence to a linear approximated equilibrium - a new notion of equilibrium that we propose - which describes that each agent's policy is a best reply (to other agents) within a linear space. Numerical experiments are also provided for both settings to demonstrate the results.