Gaoyang Pang

SY
h-index116
5papers
36citations
Novelty53%
AI Score35

5 Papers

CVNov 7, 2025
Medical Referring Image Segmentation via Next-Token Mask Prediction

Xinyu Chen, Yiran Wang, Gaoyang Pang et al.

Medical Referring Image Segmentation (MRIS) involves segmenting target regions in medical images based on natural language descriptions. While achieving promising results, recent approaches usually involve complex design of multimodal fusion or multi-stage decoders. In this work, we propose NTP-MRISeg, a novel framework that reformulates MRIS as an autoregressive next-token prediction task over a unified multimodal sequence of tokenized image, text, and mask representations. This formulation streamlines model design by eliminating the need for modality-specific fusion and external segmentation models, supports a unified architecture for end-to-end training. It also enables the use of pretrained tokenizers from emerging large-scale multimodal models, enhancing generalization and adaptability. More importantly, to address challenges under this formulation-such as exposure bias, long-tail token distributions, and fine-grained lesion edges-we propose three novel strategies: (1) a Next-k Token Prediction (NkTP) scheme to reduce cumulative prediction errors, (2) Token-level Contrastive Learning (TCL) to enhance boundary sensitivity and mitigate long-tail distribution effects, and (3) a memory-based Hard Error Token (HET) optimization strategy that emphasizes difficult tokens during training. Extensive experiments on the QaTa-COV19 and MosMedData+ datasets demonstrate that NTP-MRISeg achieves new state-of-the-art performance, offering a streamlined and effective alternative to traditional MRIS pipelines.

AIAug 15, 2024
BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination

Xin Hao, Bahareh Nakisa, Mohmmad Naim Rastgoo et al.

Deep reinforcement Learning (DRL) offers a powerful framework for training AI agents to coordinate with human partners. However, DRL faces two critical challenges in human-AI coordination (HAIC): sparse rewards and unpredictable human behaviors. These challenges significantly limit DRL to identify effective coordination policies, due to its impaired capability of optimizing exploration and exploitation. To address these limitations, we propose an innovative behavior- and context-aware reward (BCR) for DRL, which optimizes exploration and exploitation by leveraging human behaviors and contextual information in HAIC. Our BCR consists of two components: (i) A novel dual intrinsic rewarding scheme to enhance exploration. This scheme composes an AI self-motivated intrinsic reward and a human-motivated intrinsic reward, which are designed to increase the capture of sparse rewards by a logarithmic-based strategy; and (ii) A new context-aware weighting mechanism for the designed rewards to improve exploitation. This mechanism helps the AI agent prioritize actions that better coordinate with the human partner by utilizing contextual information that can reflect the evolution of learning. Extensive simulations in the Overcooked environment demonstrate that our approach can increase the cumulative sparse rewards by approximately 20%, and improve the sample efficiency by around 38% compared to state-of-the-art baselines.

SYOct 15, 2024
Communication-Control Codesign for Large-Scale Wireless Networked Control Systems

Gaoyang Pang, Wanchun Liu, Dusit Niyato et al.

Wireless Networked Control Systems (WNCSs) are essential to Industry 4.0, enabling flexible control in applications, such as drone swarms and autonomous robots. The interdependence between communication and control requires integrated design, but traditional methods treat them separately, leading to inefficiencies. Current codesign approaches often rely on simplified models, focusing on single-loop or independent multi-loop systems. However, large-scale WNCSs face unique challenges, including coupled control loops, time-correlated wireless channels, trade-offs between sensing and control transmissions, and significant computational complexity. To address these challenges, we propose a practical WNCS model that captures correlated dynamics among multiple control loops with spatially distributed sensors and actuators sharing limited wireless resources over multi-state Markov block-fading channels. We formulate the codesign problem as a sequential decision-making task that jointly optimizes scheduling and control inputs across estimation, control, and communication domains. To solve this problem, we develop a Deep Reinforcement Learning (DRL) algorithm that efficiently handles the hybrid action space, captures communication-control correlations, and ensures robust training despite sparse cross-domain variables and floating control inputs. Extensive simulations show that the proposed DRL approach outperforms benchmarks and solves the large-scale WNCS codesign problem, providing a scalable solution for industrial automation.

ITOct 18, 2024
Wireless Human-Machine Collaboration in Industry 5.0

Gaoyang Pang, Wanchun Liu, Dusit Niyato et al.

Wireless Human-Machine Collaboration (WHMC) represents a critical advancement for Industry 5.0, enabling seamless interaction between humans and machines across geographically distributed systems. As the WHMC systems become increasingly important for achieving complex collaborative control tasks, ensuring their stability is essential for practical deployment and long-term operation. Stability analysis certifies how the closed-loop system will behave under model randomness, which is essential for systems operating with wireless communications. However, the fundamental stability analysis of the WHMC systems remains an unexplored challenge due to the intricate interplay between the stochastic nature of wireless communications, dynamic human operations, and the inherent complexities of control system dynamics. This paper establishes a fundamental WHMC model incorporating dual wireless loops for machine and human control. Our framework accounts for practical factors such as short-packet transmissions, fading channels, and advanced HARQ schemes. We model human control lag as a Markov process, which is crucial for capturing the stochastic nature of human interactions. Building on this model, we propose a stochastic cycle-cost-based approach to derive a stability condition for the WHMC system, expressed in terms of wireless channel statistics, human dynamics, and control parameters. Our findings are validated through extensive numerical simulations and a proof-of-concept experiment, where we developed and tested a novel wireless collaborative cart-pole control system. The results confirm the effectiveness of our approach and provide a robust framework for future research on WHMC systems in more complex environments.

SYSep 26, 2021
Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control

Gaoyang Pang, Kang Huang, Daniel E. Quevedo et al.

We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.