63.7AIMay 11Code
LLM4Branch: Large Language Model for Discovering Efficient Branching Policies of Integer ProgramsZhinan Hou, Xingchen Li, Yankai Zhang et al.
Efficient branching policies are essential for accelerating Mixed Integer Linear Programming (MILP) solvers. Their design has long relied on hand-crafted heuristics, and now machine learning has emerged as a promising paradigm to automate this process. However, existing learning-based methods are often hindered by their dependence on expensive expert demonstrations and the gap between training objectives and the solver's end-to-end performance. In this work, we propose LLM4Branch, a novel framework that leverages Large Language Models (LLMs) to automate the discovery of efficient branching policies. Specifically, the discovered policy is an executable program with a program skeleton generated by the LLM and a parameter vector, which is optimized via a zeroth-order method over a few instances with their end-to-end performance feedback. Extensive experiments on standard MILP benchmarks demonstrate that LLM4Branch establishes a new state-of-the-art among CPU-based methods and achieves performance competitive with advanced GPU-based models. Codes are available at https://github.com/hzn18/LLM4Branch.
AIJan 28Code
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task ExecutionLe Zhang, Yixiong Xiao, Xinjiang Lu et al.
Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms, supporting computer-use and phone-use scenarios. Building an effective GUI agent model relies on two factors: (1) high-quality data and (2) effective training methods. To address these, we introduce a carefully engineered data-construction pipeline and a decoupled training paradigm. For data construction, we leverage rigorously curated open-source datasets and introduce a novel automated synthesis framework that integrates bottom-up autonomous exploration with top-down taxonomy-guided generation to create high-fidelity synthetic data. For training, to better leverage these data, we adopt a two-stage strategy: Supervised Fine-Tuning (SFT) to establish fundamental interaction syntax, followed by Group Relative Policy Optimization (GRPO) to improve spatial grounding and sequential planning. To balance computational efficiency with agentic reasoning capacity, OmegaUse is built on a Mixture-of-Experts (MoE) backbone. To evaluate cross-terminal capabilities in an offline setting, we introduce OS-Nav, a benchmark suite spanning multiple operating systems: ChiM-Nav, targeting Chinese Android mobile environments, and Ubu-Nav, focusing on routine desktop interactions on Ubuntu. Extensive experiments show that OmegaUse is highly competitive across established GUI benchmarks, achieving a state-of-the-art (SOTA) score of 96.3% on ScreenSpot-V2 and a leading 79.1% step success rate on AndroidControl. OmegaUse also performs strongly on OS-Nav, reaching 74.24% step success on ChiM-Nav and 55.9% average success on Ubu-Nav.
CVMay 19, 2025Code
MAGI-1: Autoregressive Video Generation at ScaleSand. ai, Hansi Teng, Hongyu Jia et al.
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.
MMJul 5, 2016
Dynamic Flow Scheduling Strategy in Multihoming Video CDNsMing Ma, Zhi Wang, Yankai Zhang et al.
Multihoming for a video Content Delivery Network (CDN) allows edge peering servers to deliver video chunks through different Internet Service Providers (ISPs), to achieve an improved quality of service (QoS) for video streaming users. However, since traditional strategies for a multihoming video CDN are simply designed according to static rules, e.g., simply sending traffic via a ISP which is the same as the ISP of client, they fail to dynamically allocate resources among different ISPs over time. In this paper, we perform measurement studies to demonstrate that such static allocation mechanism is inefficient to make full utilization of multiple ISPs' resources. To address this problem, we propose a dynamic flow scheduling strategy for multihoming video CDN. The challenge is to find the control parameters that can guide the ISP selection when performing flow scheduling. Using a data-driven approach, we find factors that have a major impact on the performance improvement in the dynamic flow scheduling. We further utilize an information gain approach to generate parameter combinations that can be used to guide the flow scheduling, i.e., to determine the ISP each request should be responded by. Our evaluation results demonstrate that our design effectively performs the flow scheduling. In particular, our design yields near optimal performance in a simulation of real-world multihoming setup.