Shulu Li

h-index1

3papers

5citations

3 Papers

23.2AIJul 6

LLM-as-a-Verifier: A General-Purpose Verification Framework

Jacky Kwok, Shulu Li, Pranav Atreya et al.

Scaling pre-training, post-training, and test-time compute have become the central paradigms for improving the capabilities of LLMs. In this work, we identify verification, the ability to determine the correctness of a solution, as a new scaling axis. To unlock this and demonstrate its effectiveness, we introduce LLM-as-a-Verifier, a general-purpose verification framework that provides fine-grained feedback for agentic tasks without requiring additional training. Unlike standard LM judges that prompt LLMs to produce discrete scores for candidate solutions, LLM-as-a-Verifier computes the expectation over the distribution of scoring token logits to generate continuous scores. This probabilistic formulation enables verification to scale along multiple dimensions: (1) score granularity, (2) repeated evaluation, and (3) criteria decomposition. In particular, we show that scaling the scoring granularity leads to better separation between positive and negative solutions, resulting in more calibrated comparisons. Moreover, scaling repeated evaluation and criteria decomposition consistently lead to additional gains in verification accuracy through variance and complexity reduction. We further introduce a cost-efficient ranking algorithm for selecting the best solution among candidates using the verifier's continuous scores. LLM-as-a-Verifier achieves state-of-the-art performance on Terminal-Bench V2 (86.5%), SWE-Bench Verified (78.2%), RoboRewardBench (87.4%), and MedAgentBench (73.3%). Beyond verification, the fine-grained signals from LLM-as-a-Verifier can also serve as a proxy for estimating task progress. We build an extension for Claude Code, enabling developers to monitor and improve their own agentic systems. Finally, we show that LLM-as-a-Verifier can provide dense feedback for RL, improving the sample efficiency of SAC and GRPO on robotics and mathematical reasoning benchmarks.

1.2DCJan 29

Maxwait: A Generalized Mechanism for Distributed Time-Sensitive Systems

Francesco Paladino, Shulu Li, Edward A. Lee

Distributed time-sensitive systems must balance timing requirements (availability) and consistency in the presence of communication delays and synchronization uncertainty. This paper presents maxwait, a simple coordination mechanism with surprising generality that makes these tradeoffs explicit and configurable. We demonstrate that this mechanism subsumes classical distributed system methods such as PTIDES, Chandy-and-Misra with or without null messages, Jefferson's Time-Warp, and Lamport's time-based fault detection, while enabling real-time behavior in distributed cyber-physical applications. The mechanism can also realize many commonly used distributed system patterns, including logical execution time (LET), publish and subscribe, actors, conflict-free replicated data types (CRDTs), and remote procedure calls with futures. More importantly, it adds to these mechanisms better control over timing, bounded time fault detection, and the option of making them more deterministic, all within a single semantic framework. Implemented as an extension of the Lingua Franca coordination language, maxwait enforces logical-time consistency when communication latencies are bounded and provides structured fault handling when bounds are violated.

2.2RODec 2, 2024

HPRM: High-Performance Robotic Middleware for Intelligent Autonomous Systems

Jacky Kwok, Shulu Li, Marten Lohstroh et al.

The rise of intelligent autonomous systems, especially in robotics and autonomous agents, has created a critical need for robust communication middleware that can ensure real-time processing of extensive sensor data. Current robotics middleware like Robot Operating System (ROS) 2 faces challenges with nondeterminism and high communication latency when dealing with large data across multiple subscribers on a multi-core compute platform. To address these issues, we present High-Performance Robotic Middleware (HPRM), built on top of the deterministic coordination language Lingua Franca (LF). HPRM employs optimizations including an in-memory object store for efficient zero-copy transfer of large payloads, adaptive serialization to minimize serialization overhead, and an eager protocol with real-time sockets to reduce handshake latency. Benchmarks show HPRM achieves up to 173x lower latency than ROS2 when broadcasting large messages to multiple nodes. We then demonstrate the benefits of HPRM by integrating it with the CARLA simulator and running reinforcement learning agents along with object detection workloads. In the CARLA autonomous driving application, HPRM attains 91.1% lower latency than ROS2. The deterministic coordination semantics of HPRM, combined with its optimized IPC mechanisms, enable efficient and predictable real-time communication for intelligent autonomous systems.