88.6DCMay 6
Nitsum: Serving Tiered LLM Requests with Adaptive Tensor ParallelismVikranth Srivatsa, Zijian He, Pu Guo et al.
LLM serving is increasingly multi-tenant: the same deployment must handle latency-critical interactive requests and more relaxed background workloads under a fixed GPU budget. This creates a tiered-SLO setting where maximizing overall goodput (requests that satisfy both TTFT and TPOT targets) is challenging because workload mix, request lengths, and load intensity vary over time. Existing systems mainly optimize request-level controls (e.g., queuing and batching) while keeping execution configuration largely static, which limits adaptation under multi-tier contention. We present Nitsum, a distributed LLM serving system that treats tensor parallelism (TP) as a first-class runtime control surface rather than a static deployment choice. Nitsum jointly optimizes TP level, prefill/decode GPU split, and request scheduling. To make frequent TP adaptation practical, Nitsum introduces TP-aware weight reuse and fast KV migration. Experiments on real traces and targeted microbenchmarks show that Nitsum improves SLO-compliant goodput over SoTA by up to 5.3 times.
ITMar 6
On the Secrecy Performance of Continuous-Aperture Arrays Over Fading ChannelsXuan Yang, Chongjun Ouyang, Dongming Li et al.
The secrecy performance of continuous-aperture array (CAPA)-based wiretap channels in terms of secrecy rate and secrecy outage probability (SOP) is analyzed. First, the system models of CAPA systems with maximum-ratio transmission under a Rayleigh fading channel are established, and approximate probability density functions for the legitimate user Bob's signal-to-noise ratio (SNR) and the eavesdropper Eve's SNR are derived using Mercer's theorem and Landau's eigenvalue theorem. Three scenarios are considered, including a single Eve, multiple independent Eves, and multiple collaborative Eves. Next, the expressions of the secrecy rate and SOP under these three scenarios are derived, and the high-SNR slope, high-SNR power offset, diversity order, and array gain in Bob's high-SNR region are obtained. It is then theoretically proven that, in all three scenarios, the CAPA system achieves the same high-SNR slope and the same diversity order, with the latter being equal to the spatial degrees of freedom. Moreover, the CAPA system with a single Eve has the smallest high-SNR offset and the highest array gain, whereas the CAPA system with multiple collaborative Eves exhibits the largest high-SNR offset and the lowest array gain. Finally, the theoretical analyses of secrecy rate, SOP, high-SNR performance are validated by the simulation results, and a higher secrecy rate and a lower SOP are achieved by the CAPA systems compared to the spatially-discrete array systems with half-wavelength antenna spacing.
DCMay 8, 2024Code
Preble: Efficient Distributed Prompt Scheduling for LLM ServingVikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices are to include domain-specific instructions, illustration of tool usages, and/or long context such as textbook chapters in prompts. As such, many parts of prompts are repetitive across requests. Recent works propose to cache and reuse KV state of prompts. However, they are all confined to a single-GPU optimization, while production LLM serving systems are distributed by nature. This paper proposes Preble, the first distributed LLM serving platform that targets and optimizes for prompt sharing. We designed a distributed scheduling system that co-optimizes KV state reuse and computation load-balancing with a new scheduling algorithm and a hierarchical scheduling mechanism. Our evaluation of Preble with real workloads and request arrival patterns on two open-source LLMs shows that Preble outperforms the SOTA serving systems by 1.5X to 14.5X on average latency and 2X to 10X on p99 latency.
CVAug 7, 2024
Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico TrialXian Zhong, Zohaib Salahuddin, Yi Chen et al.
Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE-MLP) model for preoperative PHLF prediction. This model integrated counterfactuals and layerwise relevance propagation (LRP) to provide insights into its decision-making mechanism. Additionally, we proposed a methodological framework for evaluating the explainability of AI systems. This framework includes qualitative and quantitative assessments of explanations against recognized biomarkers, usability evaluations, and an in silico clinical trial. Our evaluations demonstrated that the model's explanation correlated with established biomarkers and exhibited high usability at both the case and system levels. Furthermore, results from the three-track in silico clinical trial showed that clinicians' prediction accuracy and confidence increased when AI explanations were provided.
CLJun 8, 2021
Insight from NLP Analysis: COVID-19 Vaccines Sentiments on Social MediaTao Na, Wei Cheng, Dongming Li et al.
Social media is an appropriate source for analyzing public attitudes towards the COVID-19 vaccine and various brands. Nevertheless, there are few relevant studies. In the research, we collected tweet posts by the UK and US residents from the Twitter API during the pandemic and designed experiments to answer three main questions concerning vaccination. To get the dominant sentiment of the civics, we performed sentiment analysis by VADER and proposed a new method that can count the individual's influence. This allows us to go a step further in sentiment analysis and explain some of the fluctuations in the data changing. The results indicated that celebrities could lead the opinion shift on social media in vaccination progress. Moreover, at the peak, nearly 40\% of the population in both countries have a negative attitude towards COVID-19 vaccines. Besides, we investigated how people's opinions toward different vaccine brands are. We found that the Pfizer vaccine enjoys the most popular among people. By applying the sentiment analysis tool, we discovered most people hold positive views toward the COVID-19 vaccine manufactured by most brands. In the end, we carried out topic modelling by using the LDA model. We found residents in the two countries are willing to share their views and feelings concerning the vaccine. Several death cases have occurred after vaccination. Due to these negative events, US residents are more worried about the side effects and safety of the vaccine.