Chengyue Yu

AI
h-index9
4papers
49citations
Novelty55%
AI Score60

4 Papers

AIMar 10Code
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

Zuhao Zhang, Chengyue Yu, Yuante Li et al.

With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not only render visual interfaces but also construct customized interaction logic that adheres to real-world principles. However, existing benchmarks primarily focus on algorithmic correctness or static layout reconstruction, failing to capture the capabilities required for this new paradigm. To address this gap, we introduce MiniAppBench, the first comprehensive benchmark designed to evaluate principle-driven, interactive application generation. Sourced from a real-world application with 10M+ generations, MiniAppBench distills 500 tasks across six domains (e.g., Games, Science, and Tools). Furthermore, to tackle the challenge of evaluating open-ended interactions where no single ground truth exists, we propose MiniAppEval, an agentic evaluation framework. Leveraging browser automation, it performs human-like exploratory testing to systematically assess applications across three dimensions: Intention, Static, and Dynamic. Our experiments reveal that current LLMs still face significant challenges in generating high-quality MiniApps, while MiniAppEval demonstrates high alignment with human judgment, establishing a reliable standard for future research. Our code is available in github.com/MiniAppBench.

CLJan 7, 2024Code
CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLM

Chengyue Yu, Lei Zang, Jiaotuan Wang et al.

Automatic Chinese classical poetry generation has attracted much research interest, but achieving effective control over format and content simultaneously remains challenging. Traditional systems usually accept keywords as user inputs, resulting in limited control over content. Large language models (LLMs) improve content control by allowing unrestricted user instructions, but the token-by-token generation process frequently makes format errors. Motivated by this, we propose CharPoet, a Chinese classical poetry generation system based on token-free LLM, which provides effective control over both format and content. Our token-free architecture generates in a character-by-character manner, enabling precise control over the number of characters. Pruned from existing token-based LLMs, CharPoet inherits their pretrained capabilities and can generate poetry following instructions like "Write me a poem for my mother's birthday." CharPoet achieves format accuracy above 0.96, outperforming Jiuge-GPT-2 (0.91) and GPT-4 (0.38). In terms of content quality, CharPoet surpasses traditional systems including Jiuge, and is comparable to other LLMs. Our system is open source and available at https://modelscope.cn/models/CharPoet/CharPoet. A video demonstration of CharPoet is available at https://youtu.be/voZ25qEp3Dc.

AIAug 28, 2025Code
AWorld: Orchestrating the Training Recipe for Agentic AI

Chengyue Yu, Siyuan Lu, Chenyi Zhuang et al.

The learning from practice paradigm is crucial for developing capable Agentic AI systems, yet it is severely hampered by inefficient experience generation, a bottleneck especially pronounced in complex benchmarks like GAIA. To address this, we introduce AWorld, an open-source system engineered for large-scale agent-environment interaction. By distributing tasks across a cluster, AWorld accelerates experience collection by 14.6x compared to standard single-node, sequential execution. This critical speedup makes extensive reinforcement learning practical and scalable. Leveraging this capability, we trained a Qwen3-32B-based agent that achieves pass@1 accuracy of 32.23% on the GAIA test set, which surpasses GPT-4o (27.91%) and rivals DeepSeek-V3 (31.89%). Our open-source system and the resulting agent provide a practical blueprint for a complete agentic AI training pipeline, from efficient interaction to demonstrable model improvement.

AIAug 13, 2025Code
Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld

Zhitian Xie, Qintong Wu, Chengyue Yu et al.

The rapid advancement of large language models (LLMs) has empowered intelligent agents to leverage diverse external tools for solving complex real-world problems. However, this reliance introduces new challenges, as extended contexts and noisy tool outputs can undermine system reliability. To address this, we propose a dynamic Multi-Agent System (MAS) in our AWorld framework, where an Execution Agent is supervised by a Guard Agent that provides on-demand dynamic maneuvering, verifying and correcting the reasoning process to improve robustness over single-agent systems. To move beyond this generic supervision, we enhance the architecture with a methodology inspired by System Identification from control theory. This method first profiles the Execution Agent offline on a benchmark dataset to create a "performance fingerprint" of its unique weaknesses. The Guard Agent then leverages this fingerprint online to deliver profile-aware supervision, making targeted interventions based on known failure patterns rather than merely reacting to immediate logical flaws. Extensive experiments on the GAIA dataset demonstrate that this profile-aware MAS significantly improves both effectiveness and stability, outperforming not only single-agent systems but also its naive counterpart. This superior performance led our system to achieve first place among open-source projects on the prestigious GAIA leaderboard. These findings highlight that building truly trustworthy intelligent systems requires not just collaboration, but a deep, empirically-grounded understanding of each agent's unique capabilities and limitations.