cs.DCComputer Science

Distributed Computing

Distributed systems, parallel computing, cloud

28.3DCMar 13

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

Bangjun Xiao, Yihao Zhao, Xiangwei Deng et al.

This work addresses resource management inefficiencies for cloud-based agentic RL systems, offering significant performance gains and cost savings, though it is incremental as it builds on existing frameworks with a novel orchestration approach.

17.3AIMay 22

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

Shubham Agarwal, Alexander Krentsel, Shu Liu et al.

For developers of safety-critical distributed systems, IDS dramatically reduces the effort and cost of formal verification, which previously required months to years of expert work.

18.7ROMar 11

Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure

Chen Zhou, Haoran Sun, Hedan Yang et al.

This work addresses infrastructure bottlenecks for researchers and developers in embodied AI, though it is incremental as it builds on existing frameworks like LeRobot.

14.9DCMar 10

ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios

Xinyi Hu, Yuhao Shen, Baolin Zhang et al.

This addresses the bottleneck of verification compute in production-grade LLM serving, offering a novel solution for high-concurrency scenarios.

31.5CVMay 18

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Yukang Chen, Luozhou Wang, Wei Huang et al.

This work addresses speed and memory bottlenecks in long video generation for practitioners, offering a practical system-level solution.

16.9LGMay 18

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Zhongzhu Zhou, Donglin Zhuang, Jisen Li et al.

This work addresses the challenge of accurate and deployable INT2 KV cache quantization for long-context LLM serving, offering a practical solution that integrates with existing frameworks.

9.1CLApr 20Code2

DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models

You-Liang Huang, Xinhao Huang, Chengxi Liao et al.

For researchers and engineers scaling LLMs via decomposition, DeInfer provides a practical solution to a critical performance bottleneck.

29.9AIMay 7

Safactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligence

Xinquan Chen, Zhenyun Yin, Shan He et al.

For researchers building autonomous agents, Safactory offers a unified framework for systematic risk discovery and continuous improvement, but the contribution is primarily architectural and lacks empirical validation.

14.7LGMar 12Code130

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Jae-Won Chung, Jeff J. Ma, Jisang Ahn et al.

This addresses the problem of efficient and scalable serving for complex multimodal models, which is incremental as it builds on existing distributed systems and Kubernetes.

15.7DCMar 18

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression

Ruibo Fan, Xiangrui Yu, Xinglin Pan et al.

This addresses the problem of slow and memory-intensive LLM serving for AI practitioners, offering a novel co-designed solution that provides both compression and acceleration.

17.0DCApr 9Code5

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism

Yuhao Shen, Junyi Shen, Quan Kong et al.

This addresses inference latency issues for users of large language models, representing a significant but incremental improvement over existing speculative decoding methods.

20.5AIMay 7Code87

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori, Shihang Li, Simon Peter et al.

This work challenges the paradigm of general-purpose LLM serving stacks by proposing generation-time specialization, which could benefit system builders and researchers dealing with diverse model architectures, workloads, and hardware.

13.1DCMay 24Code115

Efficient Distributed MLLM Training with Cornstarch

Insu Jang, Runyu Lu, Nikhil Bansal et al.

For researchers and engineers training large multimodal models, Cornstarch provides a more efficient distributed training approach tailored to the heterogeneity of MLLMs.

24.1DCApr 1

OSGym: Scalable OS Infra for Computer Use Agents

Zengyi Qin, Jinyuan Chen, Yunze Man et al.

This addresses the resource-intensive infrastructure needed for computer use agent research, offering a scalable solution.

27.7DCMay 11

Accelerating Compound LLM Training Workloads with Maestro

Xiulong Yuan, Hongqing Chen, Jiaxuan Peng et al.

For practitioners training complex multi-component LLMs, Maestro provides a practical framework that significantly improves GPU utilization and throughput over monolithic approaches.

14.3COMP-PHMar 11Code4

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

Yadi Cao, Sicheng Lai, Jiahe Huang et al.

This addresses the gap in cost-aware evaluation for physics simulations, providing a practical benchmark for researchers and practitioners, though it is incremental in extending existing benchmarking approaches.

7.8DCMar 26Code

eBeeMetrics: An eBPF-based Library Framework for Feedback-free Observability of QoS Metrics

Muntaka Ibnath, Mohammadreza Rezvani, Daniel Wong

This addresses the complexity and overhead of instrumenting applications for QoS feedback in system management, though it appears incremental as an eBPF-based tool for observability.

14.7DCMar 10Code704

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Shuo Yang, Haocheng Xi, Yilong Zhao et al. · tsinghua

This work enables k-means as an efficient online primitive for modern AI systems, addressing a domain-specific bottleneck in GPU workloads.

13.8DCMar 12

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Nina Wiedemann, Quentin Leboutet, Michael Paulitsch et al.

This addresses the problem of efficient GPU kernel optimization for developers and researchers, offering a novel method that improves performance over existing approaches.

21.8DCMay 7

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

Wei Gao, Yuheng Zhao, Dilxat Muhtar et al.

For LLM post-training systems, this addresses the inefficiency of static GPU provisioning for agentic RL rollouts.