Beitong Zhou

CV
h-index11
7papers
647citations
Novelty49%
AI Score52

7 Papers

CVAug 14, 2025Code
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Zhangxuan Gu, Zhengwen Zeng, Zhenyu Xu et al.

We present UI-Venus, a native UI agent that takes only screenshots as input based on a multimodal large language model. UI-Venus achieves SOTA performance on both UI grounding and navigation tasks using only several hundred thousand high-quality training samples through reinforcement finetune (RFT) based on Qwen2.5-VL. Specifically, the 7B and 72B variants of UI-Venus obtain 94.1% / 50.8% and 95.3% / 61.9% on the standard grounding benchmarks, i.e., Screenspot-V2 / Pro, surpassing the previous SOTA baselines including open-source GTA1 and closed-source UI-TARS-1.5. To show UI-Venus's summary and planing ability, we also evaluate it on the AndroidWorld, an online UI navigation arena, on which our 7B and 72B variants achieve 49.1% and 65.9% success rate, also beating existing models. To achieve this, we introduce carefully designed reward functions for both UI grounding and navigation tasks and corresponding efficient data cleaning strategies. To further boost navigation performance, we propose Self-Evolving Trajectory History Alignment & Sparse Action Enhancement that refine historical reasoning traces and balances the distribution of sparse but critical actions, leading to more coherent planning and better generalization in complex UI tasks. Our contributions include the publish of SOTA open-source UI agents, comprehensive data cleaning protocols and a novel self-evolving framework for improving navigation performance, which encourage further research and development in the community. Code is available at https://github.com/inclusionAI/UI-Venus.

CVDec 18, 2025
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

Beitong Zhou, Zhexiao Huang, Yuan Guo et al.

GUI grounding is a critical component in building capable GUI agents. However, existing grounding benchmarks suffer from significant limitations: they either provide insufficient data volume and narrow domain coverage, or focus excessively on a single platform and require highly specialized domain knowledge. In this work, we present VenusBench-GD, a comprehensive, bilingual benchmark for GUI grounding that spans multiple platforms, enabling hierarchical evaluation for real-word applications. VenusBench-GD contributes as follows: (i) we introduce a large-scale, cross-platform benchmark with extensive coverage of applications, diverse UI elements, and rich annotated data, (ii) we establish a high-quality data construction pipeline for grounding tasks, achieving higher annotation accuracy than existing benchmarks, and (iii) we extend the scope of element grounding by proposing a hierarchical task taxonomy that divides grounding into basic and advanced categories, encompassing six distinct subtasks designed to evaluate models from complementary perspectives. Our experimental findings reveal critical insights: general-purpose multimodal models now match or even surpass specialized GUI models on basic grounding tasks. In contrast, advanced tasks, still favor GUI-specialized models, though they exhibit significant overfitting and poor robustness. These results underscore the necessity of comprehensive, multi-tiered evaluation frameworks.

CVFeb 27, 2024Code
PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning

Xiaoyu Liu, Beitong Zhou, Zuogong Yue et al.

Recently, the usage of Contrastive Representation Learning (CRL) as a pre-training technique improves the performance of learning with noisy labels (LNL) methods. However, instead of pre-training, when trivially combining CRL loss with LNL methods as an end-to-end framework, the empirical experiments show severe degeneration of the performance. We verify through experiments that this issue is caused by optimization conflicts of losses and propose an end-to-end \textbf{PLReMix} framework by introducing a Pseudo-Label Relaxed (PLR) contrastive loss. This PLR loss constructs a reliable negative set of each sample by filtering out its inappropriate negative pairs, alleviating the loss conflicts by trivially combining these losses. The proposed PLR loss is pluggable and we have integrated it into other LNL methods, observing their improved performance. Furthermore, a two-dimensional Gaussian Mixture Model is adopted to distinguish clean and noisy samples by leveraging semantic information and model outputs simultaneously. Experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed method. Code is available at \url{https://github.com/lxysl/PLReMix}.

CVFeb 9Code
UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao, Zhangxuan Gu et al.

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

LGApr 1, 2019
A Novel GAN-based Fault Diagnosis Approach for Imbalanced Industrial Time Series

Wenqian Jiang, Cheng Cheng, Beitong Zhou et al.

This paper proposes a novel fault diagnosis approach based on generative adversarial networks (GAN) for imbalanced industrial time series where normal samples are much larger than failure cases. We combine a well-designed feature extractor with GAN to help train the whole network. Aimed at obtaining data distribution and hidden pattern in both original distinguishing features and latent space, the encoder-decoder-encoder three-sub-network is employed in GAN, based on Deep Convolution Generative Adversarial Networks (DCGAN) but without Tanh activation layer and only trained on normal samples. In order to verify the validity and feasibility of our approach, we test it on rolling bearing data from Case Western Reserve University and further verify it on data collected from our laboratory. The results show that our proposed approach can achieve excellent performance in detecting faulty by outputting much larger evaluation scores.

LGMar 2, 2019
Wasserstein Distance based Deep Adversarial Transfer Learning for Intelligent Fault Diagnosis

Cheng Cheng, Beitong Zhou, Guijun Ma et al.

The demand of artificial intelligent adoption for condition-based maintenance strategy is astonishingly increased over the past few years. Intelligent fault diagnosis is one critical topic of maintenance solution for mechanical systems. Deep learning models, such as convolutional neural networks (CNNs), have been successfully applied to fault diagnosis tasks for mechanical systems and achieved promising results. However, for diverse working conditions in the industry, deep learning suffers two difficulties: one is that the well-defined (source domain) and new (target domain) datasets are with different feature distributions; another one is the fact that insufficient or no labelled data in target domain significantly reduce the accuracy of fault diagnosis. As a novel idea, deep transfer learning (DTL) is created to perform learning in the target domain by leveraging information from the relevant source domain. Inspired by Wasserstein distance of optimal transport, in this paper, we propose a novel DTL approach to intelligent fault diagnosis, namely Wasserstein Distance based Deep Transfer Learning (WD-DTL), to learn domain feature representations (generated by a CNN based feature extractor) and to minimize the distributions between the source and target domains through adversarial training. The effectiveness of the proposed WD-DTL is verified through 3 transfer scenarios and 16 transfer fault diagnosis experiments of both unsupervised and supervised (with insufficient labelled data) learning. We also provide a comprehensive analysis of the network visualization of those transfer tasks.

LGDec 17, 2018
A General End-to-end Diagnosis Framework for Manufacturing Systems

Ye Yuan, Guijun Ma, Cheng Cheng et al.

The manufacturing sector is envisioned to be heavily influenced by artificial intelligence-based technologies with the extraordinary increases in computational power and data volumes. A central challenge in manufacturing sector lies in the requirement of a general framework to ensure satisfied diagnosis and monitoring performances in different manufacturing applications. Here we propose a general data-driven, end-to-end framework for the monitoring of manufacturing systems. This framework, derived from deep learning techniques, evaluates fused sensory measurements to detect and even predict faults and wearing conditions. This work exploits the predictive power of deep learning to automatically extract hidden degradation features from noisy, time-course data. We have experimented the proposed framework on ten representative datasets drawn from a wide variety of manufacturing applications. Results reveal that the framework performs well in examined benchmark applications and can be applied in diverse contexts, indicating its potential use as a critical corner stone in smart manufacturing.