94.9CLMay 28
PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent CollaborationShuyu Zhang, Yaqi Shi, Lu Wang
LLM multi-agent systems often coordinate through natural-language dialogue or loosely structured shared memory, making intermediate state difficult to validate, attribute, and audit. We introduce PatchBoard, a schema-grounded collaboration architecture that replaces inter-agent dialogue with validated JSON Patch mutations over a shared structured state. An Architect agent constructs a task-specific schema and workflow rules, while a deterministic kernel validates each proposed state mutation against schema constraints, role-specific write contracts, and runtime invariants before committing it transactionally. On 630 matched ALFWorld episodes, PatchBoard achieves an 84.6% success rate, compared with 30.8% for LangGraph and 61.6% for Flock, while reducing tokens per successful task to 45.5k, compared with 368.3k and 64.2k, respectively.
81.5CLApr 17
EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTargetShuyu Zhang, Lingfeng Pan, Qicheng Wang et al.
Speculative decoding accelerates Large Language Model inference via a draft-then-verify paradigm, yet the output projection layer becomes a bottleneck as vocabulary sizes scale. While existing static pruning methods effectively reduce this overhead, they suffer from precipitous drops in acceptance rate in specialized domains or topic-switching scenarios due to their inability to capture dynamic distribution shifts. To address this, we introduce EvoSpec, a framework that enables real-time evolution of the draft model through dynamic vocabulary and parameter adaptation. Unlike static or purely retrieval-based approaches, EvoSpec employs a context-aware mechanism that retrieves critical long-tail tokens via efficient semantic and statistical indexing. Furthermore, we propose a lightweight online alignment strategy utilizing curriculum learning to continually minimize the distributional gap between the draft and target models. Extensive evaluations across specialized domains (coding, law, and medicine) confirm that EvoSpec overcomes the limitations of static baselines. On EAGLE-3, it achieves a 1.13x speedup in these settings over the state-of-the-art static baseline FR-Spec, with 27\% lower memory overhead than standard online adaptation.
MLSep 10, 2025Code
PEHRT: A Common Pipeline for Harmonizing Electronic Health Record data for Translational ResearchJessica Gronsbell, Vidul Ayakulangara Panickan, Chris Lin et al.
Integrative analysis of multi-institutional Electronic Health Record (EHR) data enhances the reliability and generalizability of translational research by leveraging larger, more diverse patient cohorts and incorporating multiple data modalities. However, harmonizing EHR data across institutions poses major challenges due to data heterogeneity, semantic differences, and privacy concerns. To address these challenges, we introduce $\textit{PEHRT}$, a standardized pipeline for efficient EHR data harmonization consisting of two core modules: (1) data pre-processing and (2) representation learning. PEHRT maps EHR data to standard coding systems and uses advanced machine learning to generate research-ready datasets without requiring individual-level data sharing. Our pipeline is also data model agnostic and designed for streamlined execution across institutions based on our extensive real-world experience. We provide a complete suite of open source software, accompanied by a user-friendly tutorial, and demonstrate the utility of PEHRT in a variety of tasks using data from diverse healthcare systems.
MLNov 29, 2024
Another look at inference after predictionJessica Gronsbell, Jianhui Gao, Yaqi Shi et al.
From structural biology to epidemiology, predictions from machine learning (ML) models increasingly complement costly gold-standard data to enable faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions together with a small amount of gold-standard data. The goals of PB inference are two-fold: (i) to mitigate bias from errors in predictions and (ii) to improve efficiency relative to classical inference using only the gold-standard data. While early PB inference methods focused on bias, their ability to enhance efficiency remains a focus of ongoing research. We revisit a foundational PB inference method and show that a simple modification can be applied to guarantee provable improvements in efficiency. In doing so, we establish new connections between augmented inverse probability weighted estimators (AIPW) and several recently proposed PB inference methods with a similar focus. The utility of our proposal, which leverages prediction-based outcomes to enhance efficiency, is demonstrated through extensive simulation studies and an application to real data from the UK Biobank. Further, we contextualize PB inference by drawing connections to historical literature from economics and statistics, highlighting how classic methods directly inform this contemporary problem.