AIOct 6, 2025
Staircase Streaming for Low-Latency Multi-Agent InferenceJunlin Wang, Jue Wang, Zhen et al.
Recent advances in large language models (LLMs) opened up new directions for leveraging the collective expertise of multiple LLMs. These methods, such as Mixture-of-Agents, typically employ additional inference steps to generate intermediate outputs, which are then used to produce the final response. While multi-agent inference can enhance response quality, it can significantly increase the time to first token (TTFT), posing a challenge for latency-sensitive applications and hurting user experience. To address this issue, we propose staircase streaming for low-latency multi-agent inference. Instead of waiting for the complete intermediate outputs from previous steps, we begin generating the final response as soon as we receive partial outputs from these steps. Experimental results demonstrate that staircase streaming reduces TTFT by up to 93% while maintaining response quality.
LGDec 10, 2023
Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actionsAntoine Marot, David Rousseau, Zhen et al.
The conclusion of an AI challenge is not the end of its lifecycle; ensuring a long-lasting impact requires meticulous post-challenge activities. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. This work identifies target audiences for post-challenge initiatives and outlines methods for collecting and organizing challenge outputs. The multiple outputs of the challenge are listed, along with the means to collect them. The central part of the chapter is a template for a typical post-challenge paper, including possible graphs and advice on how to turn the challenge into a long-lasting benchmark.
SYJan 26, 2019
Estimating multi-year 24/7 origin-destination demand using high-granular multi-source traffic dataWei Ma, Zhen, Qian
Dynamic origin-destination (OD) demand is central to transportation system modeling and analysis. The dynamic OD demand estimation problem (DODE) has been studied for decades, most of which solve the DODE problem on a typical day or several typical hours. There is a lack of methods that estimate high-resolution dynamic OD demand for a sequence of many consecutive days over several years (referred to as 24/7 OD in this research). Having multi-year 24/7 OD demand would allow a better understanding of characteristics of dynamic OD demands and their evolution/trends over the past few years, a critical input for modeling transportation system evolution and reliability. This paper presents a data-driven framework that estimates day-to-day dynamic OD using high-granular traffic counts and speed data collected over many years. The proposed framework statistically clusters daily traffic data into typical traffic patterns using t-Distributed Stochastic Neighbor Embedding (t-SNE) and k-means methods. A GPU-based stochastic projected gradient descent method is proposed to efficiently solve the multi-year 24/7 DODE problem. It is demonstrated that the new method efficiently estimates the 5-minute dynamic OD demand for every single day from 2014 to 2016 on I-5 and SR-99 in the Sacramento region. The resultant multi-year 24/7 dynamic OD demand reveals the daily, weekly, monthly, seasonal and yearly change in travel demand in a region, implying intriguing demand characteristics over the years.