2.6ITJun 5
Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading ChannelsHao Wu, Shengtian Yang, Huiguo Gao et al.
This paper studies online power control for battery-limited point-to-point energy harvesting communications over slow block-fading channels. A linear-policy-based approximation is developed for the relative-value function in the Bellman equation of the power control problem. This approximation leads to two fundamental parameterized clipped affine policies: an optimistic policy derived from a certainty-equivalence-type approximation and a robust policy derived from worst-case analysis. For independent and identically distributed energy arrivals and channel states, two families of power control schemes are developed based on the optimistic clipped affine (OCA) and robust clipped affine (RCA) policies, respectively. The proposed adaptive RCA policy based on reinforcement learning (RCA-RL) is further extended to address four scenarios with contextual information: one-step energy lookahead, one-step channel lookahead, one-step joint energy-channel lookahead, and Markov energy arrivals. Extensive simulation results show that the proposed schemes provide a favorable tradeoff between computational complexity and performance. The adaptive RCA policy based on the maximin optimal linear-policy-slope approximation (RCA-OLA-A) and the RCA-RL scheme achieve the best overall performance, while the RCA policy based on the maximin optimal linear policy (RCA-OL) is the best-performing closed-form policy. In particular, RCA-OLA-A, RCA-RL, and the aforementioned RCA-RL extensions achieve less than 2% performance loss relative to the optimal policy across a range of scenarios, consistently outperforming the considered benchmark approaches, including generic reinforcement learning baselines. The RCA-OL policy also performs well with less than 4% performance loss.
CVOct 24, 2025Code
MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly DetectionShengtian Yang, Yue Feng, Yingshi Liu et al.
Video Anomaly Detection (VAD) aims to locate unusual activities or behaviors within videos. Recently, offline VAD has garnered substantial research attention, which has been invigorated by the progress in large language models (LLMs) and vision-language models (VLMs), offering the potential for a more nuanced understanding of anomalies. However, online VAD has seldom received attention due to real-time constraints and computational intensity. In this paper, we introduce a novel Memory-based online scoring queue scheme for Training-free VAD (MoniTor), to address the inherent complexities in online VAD. Specifically, MoniTor applies a streaming input to VLMs, leveraging the capabilities of pre-trained large-scale models. To capture temporal dependencies more effectively, we incorporate a novel prediction mechanism inspired by Long Short-Term Memory (LSTM) networks. This ensures the model can effectively model past states and leverage previous predictions to identify anomalous behaviors. Thereby, it better understands the current frame. Moreover, we design a scoring queue and an anomaly prior to dynamically store recent scores and cover all anomalies in the monitoring scenario, providing guidance for LLMs to distinguish between normal and abnormal behaviors over time. We evaluate MoniTor on two large datasets (i.e., UCF-Crime and XD-Violence) containing various surveillance and real-world scenarios. The results demonstrate that MoniTor outperforms state-of-the-art methods and is competitive with weakly supervised methods without training. Code is available at https://github.com/YsTvT/MoniTor.
AIFeb 19
Phase-Aware Mixture of Experts for Agentic Reinforcement LearningShengtian Yang, Yu Li, Shuo He et al.
Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks. A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network, as MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters. However, a key limitation of traditional MoE is its token-level routing, where the router assigns each token to specialized experts, which fragments phase-consistent patterns into scattered expert assignments and thus undermines expert specialization. In this paper, we propose \textbf{Phase-Aware Mixture of Experts (PA-MoE)}. It first features a lightweight \emph{phase router} that learns latent phase boundaries directly from the RL objective without pre-defining phase categories. Then, the phase router allocates temporally consistent assignments to the same expert, allowing experts to preserve phase-specific expertise. Experimental results demonstrate the effectiveness of our proposed PA-MoE.