Mengyu Zhang

AI
h-index45
3papers
9citations
Novelty58%
AI Score44

3 Papers

CLFeb 4
ERNIE 5.0 Technical Report

Haifeng Wang, Hua Wu, Tian Wu et al.

In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community.

AINov 4, 2025
Auditable-choice reframing unlocks RL-based verification for open-ended tasks

Mengyu Zhang, Xubo Liu, Siyu Ding et al.

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated great potential in enhancing the reasoning capabilities of large language models (LLMs), achieving remarkable progress in domains such as mathematics and programming where standard answers are available. However, for open-ended tasks lacking ground-truth solutions (e.g., creative writing and instruction following), existing studies typically regard them as non-reasoning scenarios, thereby overlooking the latent value of reasoning capabilities. This raises a key question: Can strengthening reasoning improve performance in open-ended tasks? To address this, we explore the transfer of the RLVR paradigm to the open domain. Yet, since RLVR fundamentally relies on verifiers that presuppose the existence of standard answers, it cannot be directly applied to open-ended tasks. To overcome this challenge, we introduce Verifiable Multiple-Choice Reformulation (VMR), a novel training strategy that restructures open-ended data into verifiable multiple-choice formats, enabling effective training even in the absence of explicit ground truth. Experimental results on multiple benchmarks validate the effectiveness of our method in improving LLM performance on open-ended tasks. Notably, across eight open-ended benchmarks, our VMR-based training delivers an average gain of 5.99 points over the baseline. Code will be released upon acceptance to facilitate reproducibility.

CRJul 30, 2025
Agentic Privacy-Preserving Machine Learning

Mengyu Zhang, Zhuotao Liu, Jingwen Huang et al.

Privacy-preserving machine learning (PPML) is critical to ensure data privacy in AI. Over the past few years, the community has proposed a wide range of provably secure PPML schemes that rely on various cryptography primitives. However, when it comes to large language models (LLMs) with billions of parameters, the efficiency of PPML is everything but acceptable. For instance, the state-of-the-art solution for confidential LLM inference represents at least 10,000-fold slower performance compared to plaintext inference. The performance gap is even larger when the context length increases. In this position paper, we propose a novel framework named Agentic-PPML to make PPML in LLMs practical. Our key insight is to employ a general-purpose LLM for intent understanding and delegate cryptographically secure inference to specialized models trained on vertical domains. By modularly separating language intent parsing - which typically involves little or no sensitive information - from privacy-critical computation, Agentic-PPML completely eliminates the need for the LLMs to process the encrypted prompts, enabling practical deployment of privacy-preserving LLM-centric services.