AIFeb 23, 2023
Intermittently Observable Markov Decision ProcessesGongpu Chen, Soung-Chang Liew
This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.
LGAug 19, 2024
GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed BanditsGongpu Chen, Soung Chang Liew, Deniz Gunduz
The restless multi-armed bandit (RMAB) framework is a popular model with applications across a wide variety of fields. However, its solution is hindered by the exponentially growing state space (with respect to the number of arms) and the combinatorial action space, making traditional reinforcement learning methods infeasible for large-scale instances. In this paper, we propose GINO-Q, a three-timescale stochastic approximation algorithm designed to learn an asymptotically optimal index policy for RMABs. GINO-Q mitigates the curse of dimensionality by decomposing the RMAB into a series of subproblems, each with the same dimension as a single arm, ensuring that complexity increases linearly with the number of arms. Unlike recently developed Whittle-index-based algorithms, GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability. Our experimental results demonstrate that GINO-Q consistently learns near-optimal policies, even for non-indexable RMABs where Whittle-index-based algorithms perform poorly, and it converges significantly faster than existing baselines.
43.3OCMay 3
The Control Plant as A Communication Channel: Implicit Communication for Decentralized LQG ControlGongpu Chen, Deniz Gündüz
We study a decentralized linear quadratic Gaussian control problem, in which a leader and a follower must steer a linear system to a target state. The target state is known only to the leader, and no explicit communication channel exists between the agents. To address the challenge posed by this asymmetric information structure, we propose an integrated communication and control (ICoCo) framework in which the control plant itself serves as a communication channel: the leader encodes the target state into its control input through an additive communication term, and the follower decodes it from the resulting state trajectory. We design an implicit coordination scheme based on joint source-channel coding ideas, and prove that the follower's estimation error decreases monotonically to zero, enabling the two agents to coordinate increasingly well and ultimately steer the system to the target state. We then formulate the design of the communication power as an optimal control problem to minimize the overall control cost. In the fully actuated leader case, we derive necessary optimality conditions and in the under-actuated case, we solve the problem numerically. Numerical results show that the proposed scheme effectively coordinates the two agents and achieves a control cost close to that of the explicit-communication lower bound.