15.7NIMay 4
Spatial-Temporal Learning-Based Distributed Routing for Dynamic LEO Satellite NetworksPo-Heng Chou, Chiapin Wang, Shou-Yu Chen et al.
In this paper, we propose a spatial-temporal learning-based distributed routing framework for dynamic Low Earth Orbit (LEO) satellite networks, where graph attention networks (GAT) and long short-term memory (LSTM) are integrated within a deep Q-network (DQN)-based architecture to enable distributed and adaptive routing decisions based on local observations. The routing problem is formulated as a partially observable Markov decision process (POMDP) to address partial observability under dynamic topology and time-varying traffic. Simulation results show that the proposed method significantly outperforms conventional and learning-based routing schemes in terms of throughput, packet loss, queue length, and end-to-end delay, while achieving proactive congestion avoidance with up to 23.26% queue reduction. In addition, the proposed approach maintains low computational overhead with negligible carbon emissions, demonstrating its efficiency from a Green AI perspective.
9.2ITMay 4
Dueling DDQN-Based Adaptive Multi-Objective Handover Optimization for LEO Satellite NetworksPo-Heng Chou, Chiapin Wang, Chung-Chi Huang et al.
In this paper, we propose a dueling double deep Q-network (DDQN)-based adaptive multi-objective handover framework for LEO satellite networks. The proposed method enables dynamic trade-off learning among throughput, blocking probability, and switching cost under time-varying network conditions. Simulation results demonstrate that the proposed approach consistently outperforms conventional baselines, achieving up to 10.3% throughput improvement and near-zero blocking under typical operating conditions.
25.4NIMay 1
A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi CoexistencePo-Heng Chou, Yi-Fang Yu, Shou-Yu Chen et al.
The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a system-level resource coordination problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive TXOP control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of system-level tradeoffs among fairness, throughput, and quality of service (QoS). Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.
SPNov 12, 2025
DRL-Based Beam Positioning for LEO Satellite Constellations with Weighted Least SquaresPo-Heng Chou, Chiapin Wang, Kuan-Hao Chen et al.
In this paper, we propose a reinforcement learning based beam weighting framework that couples a policy network with an augmented weighted least squares (WLS) estimator for accurate and low-complexity positioning in multi-beam LEO constellations. Unlike conventional geometry or CSI-dependent approaches, the policy learns directly from uplink pilot responses and geometry features, enabling robust localization without explicit CSI estimation. An augmented WLS jointly estimates position and receiver clock bias, improving numerical stability under dynamic beam geometry. Across representative scenarios, the proposed method reduces the mean positioning error by 99.3% compared with the geometry-based baseline, achieving 0.395 m RMSE with near real-time inference.