Wenhai Lin

h-index4
2papers

2 Papers

15.1NIApr 14
Throughput Characterization of Wireless CSMA Networks With Arbitrary Sensing and Interference Topologies

Xinghua Sun, Wenhai Lin, Ruike Zhou

The performance analysis of wireless CSMA networks is notoriously difficult due to the intricate sensing and interference relationships among links. Even the fundamental problem of throughput characterization remains open when sensing and interference topologies are both arbitrary. In this paper, we develop a new analytical framework for throughput characterization in wireless CSMA networks with arbitrary sensing and interference topologies. The proposed framework yields explicit throughput expressions without relying on the commonly adopted zero-propagation-delay assumption. The key idea is to exploit the clique structure of the sensing graph to transform the original CSMA network into an equivalent multi-channel network, and then model its dynamics through a discrete-time Markov renewal process. In this way, the framework explicitly captures global coupling among links and enables analytical evaluation of how access parameters affect network performance. The proposed analysis is applied to several representative CSMA scenarios, including networks with multi-BSS IEEE 802.11 networks with universal frequency reuse, and ad-hoc topologies exhibiting hidden-terminal, exposed-terminal, and flow-in-the-middle effects. Simulation results show that, in dense deployments and in scenarios with strong coupling among link behaviors, the proposed model significantly outperforms existing analytical approaches in throughput estimation and enables more accurate determination of access parameters.

CLDec 3, 2025
AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving

Ying Wang, Zhen Jin, Jiexiong Xu et al.

As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficiency and optimizing service-level objectives (SLOs) are critical for enhancing user experience. To achieve this, inference systems must maximize request handling within latency constraints, referred to as increasing effective throughput. However, existing systems face two major challenges: (i) reliance on first-come-first-served (FCFS) scheduling causes severe head-of-line blocking, leading to queuing delays exceeding the SLOs for many requests; and (ii) static batch token limit, which fails to adapt to fluctuating loads and hardware conditions. Both of these factors degrade effective throughput and service quality. This paper presents AugServe, an efficient inference framework designed to reduce queueing latency and enhance effective throughput for augmented LLM inference services. The core idea of AugServe is a two-stage adaptive request scheduling strategy. Specifically, AugServe combines the inference features of augmented LLM requests to optimize the order of scheduling decisions (stage I). These decisions are continuously refined with runtime information (stage II), adapting to both request characteristics and system capabilities. In addition, AugServe dynamically adjusts the token batching mechanism based on hardware status and real-time load, further enhancing throughput performance. Experimental results show that AugServe achieves 4.7-33.1x and 3.3-13.2x higher effective throughput than vLLM and InferCept, while reducing time-to-first-token (TTFT) by up to 96.3% and 95.0%, respectively.