Biao He

AI
h-index23
3papers
68citations
Novelty52%
AI Score36

3 Papers

AISep 30, 2025
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Siyu Zhu, Yanbin Jiang, Hejian Sang et al.

We investigated Agentic RL with large language models on the \textsc{TravelPlanner} benchmark. Our approach, \textsc{Planner-R1}, achieved a \textbf{56.9\%} final-pass rate with only 180 training queries, a $2.7\times$ improvement over GPT-5's $21.2\%$ baseline and the strongest agentic result on the public leaderboard. A central finding was that smaller models (8B) were highly responsive to reward shaping: with dense process-level signals, they reached competitive performance while being $3.5\times$ more compute-efficient and $1.5\times$ more memory-efficient than 32B models. Larger models were more robust under sparse rewards but exhibited smaller relative gains from shaping and higher variance across runs. While curriculum learning offered no significant benefit, shaped rewards consistently amplified learning dynamics, making 8B models the most efficient setting for agentic RL. Crucially, these gains did not come at the cost of overfitting: fine-tuned models mostly maintained or exceeded baseline performance on out-of-domain tasks, including \textsc{Multi-IF}, \textsc{NaturalPlan}, and $τ$-\textsc{Bench}. These results establish reward shaping as a decisive lever for scaling agentic RL, highlight the competitive strength of smaller models, and demonstrate that efficiency can be achieved without sacrificing generalization.

CRJul 20, 2018
On Secure Transmission Design: An Information Leakage Perspective

Yong Huang, Wei Wang, Biao He et al.

Information leakage rate is an intuitive metric that reflects the level of security in a wireless communication system, however, there are few studies taking it into consideration. Existing work on information leakage rate has two major limitations due to the complicated expression for the leakage rate: 1) the analytical and numerical results give few insights into the trade-off between system throughput and information leakage rate; 2) and the corresponding optimal designs of transmission rates are not analytically tractable. To overcome such limitations and obtain an in-depth understanding of information leakage rate in secure wireless communications, we propose an approximation for the average information leakage rate in the fixed-rate transmission scheme. Different from the complicated expression for information leakage rate in the literature, our proposed approximation has a low-complexity expression, and hence, it is easy for further analysis. Based on our approximation, the corresponding approximate optimal transmission rates are obtained for two transmission schemes with different design objectives. Through analytical and numerical results, we find that for the system maximizing throughput subject to information leakage rate constraint, the throughput is an upward convex non-decreasing function of the security constraint and much too loose security constraint does not contribute to higher throughput; while for the system minimizing information leakage rate subject to throughput constraint, the average information leakage rate is a lower convex increasing function of the throughput constraint.

ITJan 31, 2017
Covert Communication with Finite Blocklength in AWGN Channels

Shihao Yan, Biao He, Yirui Cong et al.

Covert communication is to achieve a reliable transmission from a transmitter to a receiver while guaranteeing an arbitrarily small probability of this transmission being detected by a warden. In this work, we study the covert communication in AWGN channels with finite blocklength, in which the number of channel uses is finite. Specifically, we analytically prove that the entire block (all available channel uses) should be utilized to maximize the effective throughput of the transmission subject to a predetermined covert requirement. This is a nontrivial result because more channel uses results in more observations at the warden for detecting the transmission. We also determine the maximum allowable transmit power per channel use, which is shown to decrease as the blocklength increases. Despite the decrease in the maximum allowable transmit power per channel use, the maximum allowable total power over the entire block is proved to increase with the blocklength, which leads to the fact that the effective throughput increases with the blocklength.