82.2DCMay 16
GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous ResourcesBoxiao Du, Boning Huangfu, Yizhou Luo et al.
Large Language Models (LLMs) play a critical role in emerging agentic applications, where the timely completion of each entire inference is critical. Meanwhile, agentic LLM inferences are increasingly served on heterogeneous GPUs in operator's resource pools. Therefore, it is crucial to route incoming inference requests to appropriate GPUs so that their end-to-end latency requirements are satisfied whenever possible, thereby achieving high goodput. In this paper, we propose GoodServe, a goodput-optimized serving system for agentic inferences over heterogeneous resources. GoodServe performs inference routing in a predict-and-rectify manner. It estimates the request output lengths as well as the GPU serving status in an accurate and also practical manner. Based on information from both the demand and resource sides, it then makes high-quality routing decisions using a just-enough instance selection heuristic. It also periodically monitors SLO-violation risks of active requests and triggers runtime request migrations to address unexpected dynamics. Our evaluations show that GoodServe improves goodput by up to 27.4% over existing routing methods.
NIJul 5, 2025
Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT NetworksYizhou Luo, Kwan-Wu Chin, Ruyi Guan et al.
Devices operating in Internet of Things (IoT) networks may be deployed across vast geographical areas and interconnected via multi-hop communications. Further, they may be unguarded. This makes them vulnerable to attacks and motivates operators to check on devices frequently. To this end, we propose and study an Unmanned Aerial Vehicle (UAV)-aided attestation framework for use in IoT networks with a charging station powered by solar. A key challenge is optimizing the trajectory of the UAV to ensure it attests as many devices as possible. A trade-off here is that devices being checked by the UAV are offline, which affects the amount of data delivered to a gateway. Another challenge is that the charging station experiences time-varying energy arrivals, which in turn affect the flight duration and charging schedule of the UAV. To address these challenges, we employ a Deep Reinforcement Learning (DRL) solution to optimize the UAV's charging schedule and the selection of devices to be attested during each flight. The simulation results show that our solution reduces the average age of trust by 88% and throughput loss due to attestation by 30%.
NIMay 25, 2020
Learning to Charge RF-Energy Harvesting Devices in WiFi NetworksYizhou Luo, Kwan-Wu Chin
In this paper, we consider a solar-powered Access Point (AP) that is tasked with supporting both non-energy harvesting or legacy data users such as laptops, and devices with Radio Frequency (RF)-energy harvesting and sensing capabilities. We propose two solutions that enable the AP to manage its harvested energy via transmit power control and also ensure devices perform sensing tasks frequently. Advantageously, our solutions are suitable for current wireless networks and do not require perfect channel gain information or non-causal energy arrival at devices. The first solution uses a deep Q-network (DQN) whilst the second solution uses Model Predictive Control (MPC) to control the AP's transmit power. Our results show that our DQN and MPC solutions improve energy efficiency and user satisfaction by respectively 16% to 35%, and 10% to 42% as compared to competing algorithms.