100.0DCApr 1
OSGym: Scalable OS Infra for Computer Use AgentsZengyi Qin, Jinyuan Chen, Yunze Man et al.
Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales. Stochastic errors arising from diverse software execution within these sandboxes further demand robust infrastructure design and reliable error recovery. We present OSGym, a scalable OS environment infrastructure for computer use agents, built around these key optimization strategies: (1) Decentralized OS state management, which isolates failures to individual replicas and significantly enhances overall system reliability; (2) Hardware-aware OS replica orchestration, which addresses CPU-bounded scaling bottlenecks and substantially reduces compute overhead; (3) KVM virtualization with copy-on-write disk management, which shares a common bootable disk across VM instances and provisions only instance-specific modifications, reducing physical disk consumption by 88% and increasing disk provisioning speed by 37 times; and (4) Robust container pool with multi-layer fault recovery. Together, these optimizations yield strong scalability and resource efficiency: OSGym manages over a thousand OS replicas under constrained resources, supports parallel trajectory generation at 1420 multi-turn trajectories per minute, and reduces per-replica cost to 0.2-0.3 USD per day, a 90% reduction over standard deployment. Our experiments validate OSGym across end-to-end pipelines for data collection and training for computer use agents. We believe OSGym establishes a new foundation for scalable, general-purpose computer use agent research.
CRNov 22, 2023
A Survey of Blockchain, Artificial Intelligence, and Edge Computing for Web 3.0Jianjun Zhu, Fan Li, Jinyuan Chen
Web 3.0, as the third generation of the World Wide Web, aims to solve contemporary problems of trust, centralization, and data ownership. Driven by the latest advances in cutting-edge technologies, Web 3.0 is moving towards a more open, decentralized, intelligent, and interconnected network. However, increasingly widespread data breaches have raised awareness of online privacy and security of personal data. Additionally, since Web 3.0 is a sophisticated and complex convergence, the technical details behind it are not as clear as the characteristics it presents. In this survey, we conduct an in-depth exploration of Web 3.0 from the perspectives of blockchain, artificial intelligence, and edge computing. Specifically, we begin with summarizing the evolution of the Internet and providing an overview of these three key technological factors. Afterward, we provide a thorough analysis of each technology separately, including its relevance to Web 3.0, key technology components, and practical applications. We also propose decentralized storage and computing solutions by exploring the integration of technologies. Finally, we highlight the key challenges alongside potential research directions. Through the combination and mutual complementation of multiple technologies, Web 3.0 is expected to return more control and ownership of data and digital assets back to users.
CVOct 12, 2025Code
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image ReasoningQunzhong Wang, Jie Liu, Jiajun Liang et al.
Recent advancements in multimodal reward models (RMs) have substantially improved post-training for visual generative models. However, current RMs face inherent limitations: (1) visual inputs consume large context budgets, forcing fewer frames and causing loss of fine-grained details; and (2) all visual information is packed into the initial prompt, exacerbating hallucination and forgetting during chain-of-thought reasoning. To overcome these issues, we introduce VideoReward Thinker (VR-Thinker), a thinking-with-image framework that equips the RM with visual reasoning operations (e.g., select frame) and a configurable visual memory window. This allows the RM to actively acquire and update visual evidence within context limits, improving reasoning fidelity and reliability. We activate visual reasoning via a reinforcement fine-tuning pipeline: (i) Cold Start with curated visual chain-of-thought data to distill basic reasoning skills and operation formatting; (ii) select samples whose per-dimension and overall judgments are all correct, then conduct Rejection sampling Fine-Tuning on these high-quality traces to further enhance reasoning; and (iii) apply Group Relative Policy Optimization (GRPO) to strengthen reasoning. Our approach delivers state-of-the-art accuracy among open-source models on video preference benchmarks, especially for longer videos: a 7B VR-Thinker achieves 80.5% on VideoGen Reward, 82.3% on GenAI-Bench, and 75.6% on MJ-Bench-Video. These results validate the effectiveness and promise of thinking-with-image multimodal reward modeling.
ITSep 23, 2020
Fundamental Limits of Byzantine AgreementJinyuan Chen
Byzantine agreement (BA) is a distributed consensus problem where $n$ processors want to reach agreement on an $\ell$-bit message or value, but up to $t$ of the processors are dishonest or faulty. The challenge of this BA problem lies in achieving agreement despite the presence of dishonest processors who may arbitrarily deviate from the designed protocol. The quality of a BA protocol is measured primarily by using the following three parameters: the number of processors $n$ as a function of $t$ allowed (resilience); the number of rounds (round complexity, denoted by $r$); and the total number of communication bits (communication complexity, denoted by $b$). For any error-free BA protocol, the known lower bounds on those three parameters are $n\geq 3t+1$, $r\geq t+1$ and $b\geqΩ(\max\{n\ell, nt\})$, respectively, where a protocol that is guaranteed to be correct in all executions is said to be error free. In this work by using coding theory, together with graph theory and linear algebra, we design a coded BA protocol (termed as COOL) that achieves consensus on an $\ell$-bit message with optimal resilience, asymptotically optimal round complexity, and asymptotically optimal communication complexity when $\ell \geq t\log t$, simultaneously. The proposed COOL is an error-free and deterministic BA protocol that does not rely on cryptographic technique. It is secure against computationally unbounded adversary. With the achievable performance by the proposed COOL and the known lower bounds, we characterize the optimal communication complexity exponent as \[β^*(α,δ)=\max\{1+α,1+δ\}\] for $β= \lim_{n\to\infty}\log b/\log n$, $α=\lim_{n \to \infty} \log \ell/\log n$ and $δ=\lim_{n\to\infty} \log t/\log n$. This work reveals that coding is an effective approach for achieving the fundamental limits of Byzantine agreement and its variants.