Haisheng Tan

CL
h-index18
3papers
3citations
Novelty52%
AI Score34

3 Papers

CLDec 16, 2025
Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents

Hongqiu Ni, Jiabao Zhang, Guopeng Li et al.

Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O and compute intensive nature and uses an enhanced HRRN policy to balance efficiency and fairness. Astraea also implements an adaptive KV cache manager that intelligently handles the agent state during I/O waits based on the system memory pressure. Extensive experiments show that Astraea reduces average JCT by up to 25.5\% compared to baseline methods. Moreover, our approach demonstrates strong robustness and stability under high load across various model scales.

NIJan 12, 2025
Real-Time Neural-Enhancement for Online Cloud Gaming

Shan Jiang, Zhenhua Han, Haisheng Tan et al.

Online Cloud gaming demands real-time, high-quality video transmission across variable wide-area networks (WANs). Neural-enhanced video transmission algorithms employing super-resolution (SR) for video quality enhancement have effectively challenged WAN environments. However, these SR-based methods require intensive fine-tuning for the whole video, making it infeasible in diverse online cloud gaming. To address this, we introduce River, a cloud gaming delivery framework designed based on the observation that video segment features in cloud gaming are typically repetitive and redundant. This permits a significant opportunity to reuse fine-tuned SR models, reducing the fine-tuning latency of minutes to query latency of milliseconds. To enable the idea, we design a practical system that addresses several challenges, such as model organization, online model scheduler, and transfer strategy. River first builds a content-aware encoder that fine-tunes SR models for diverse video segments and stores them in a lookup table. When delivering cloud gaming video streams online, River checks the video features and retrieves the most relevant SR models to enhance the frame quality. Meanwhile, if no existing SR model performs well enough for some video segments, River will further fine-tune new models and update the lookup table. Finally, to avoid the overhead of streaming model weight to the clients, River designs a prefetching strategy that predicts the models with the highest possibility of being retrieved. Our evaluation based on real video game streaming demonstrates River can reduce redundant training overhead by 44% and improve the Peak-Signal-to-Noise-Ratio by 1.81dB compared to the SOTA solutions. Practical deployment shows River meets real-time requirements, achieving approximately 720p 20fps on mobile devices.

SPApr 2, 2020
Enhance the performance of navigation: A two-stage machine learning approach

Yimin Fan, Zhiyuan Wang, Yuanpeng Lin et al.

Real time traffic navigation is an important capability in smart transportation technologies, which has been extensively studied these years. Due to the vast development of edge devices, collecting real time traffic data is no longer a problem. However, real traffic navigation is still considered to be a particularly challenging problem because of the time-varying patterns of the traffic flow and unpredictable accidents/congestion. To give accurate and reliable navigation results, predicting the future traffic flow(speed,congestion,volume,etc) in a fast and accurate way is of great importance. In this paper, we adopt the ideas of ensemble learning and develop a two-stage machine learning model to give accurate navigation results. We model the traffic flow as a time series and apply XGBoost algorithm to get accurate predictions on future traffic conditions(1st stage). We then apply the Top K Dijkstra algorithm to find a set of shortest paths from the give start point to the destination as the candidates of the output optimal path. With the prediction results in the 1st stage, we find one optimal path from the candidates as the output of the navigation algorithm. We show that our navigation algorithm can be greatly improved via EOPF(Enhanced Optimal Path Finding), which is based on neural network(2nd stage). We show that our method can be over 7% better than the method without EOPF in many situations, which indicates the effectiveness of our model.