Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications
This work addresses latency-aware semantic adaptation for immersive and safety-critical services, representing an incremental improvement by integrating human feedback and latency control into existing methods.
The paper tackles the problem of ensuring semantic fidelity and latency guarantees in semantic communications by introducing a time-constrained human-in-the-loop reinforcement learning framework, which consistently meets per-user timing constraints and outperforms baseline schedulers in reward while stabilizing resource consumption.
Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.