Zhicheng Zhu

AI
h-index7
3papers
1citation
Novelty40%
AI Score39

3 Papers

AIDec 23, 2025
Scaling Reinforcement Learning for Content Moderation with Large Language Models

Hamed Firooz, Rui Liu, Yuchen Lu et al.

Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for policy violations. Although recent advances in large language models (LLMs) have demonstrated strong potential for policy-grounded moderation, the practical challenges of training these systems to achieve expert-level accuracy in real-world settings remain largely unexplored, particularly in regimes characterized by label sparsity, evolving policy definitions, and the need for nuanced reasoning beyond shallow pattern matching. In this work, we present a comprehensive empirical investigation of scaling reinforcement learning (RL) for content classification, systematically evaluating multiple RL training recipes and reward-shaping strategies-including verifiable rewards and LLM-as-judge frameworks-to transform general-purpose language models into specialized, policy-aligned classifiers across three real-world content moderation tasks. Our findings provide actionable insights for industrial-scale moderation systems, demonstrating that RL exhibits sigmoid-like scaling behavior in which performance improves smoothly with increased training data, rollouts, and optimization steps before gradually saturating. Moreover, we show that RL substantially improves performance on tasks requiring complex policy-grounded reasoning while achieving up to 100x higher data efficiency than supervised fine-tuning, making it particularly effective in domains where expert annotations are scarce or costly.

COMar 13
Aromatic and clumped multi-indices: algebraic structure and Hopf embeddings

Zhicheng Zhu, Adrien Busnot Laurent

Butcher forests extend naturally into aromatic and clumped forests and play a fundamental role in the numerical analysis of volume-preserving methods. The description of numerical volume-preservation is filled with open problems and recent attempts showed progress on specific dynamics and in low-dimension. Following this trend, we introduce aromatic and clumped multi-indices, that are simpler algebraic objects that better describe the Taylor expansions in low dimension. We provide their algebraic structure of pre-Lie-Rinehart algebra, Hopf algebroid, and Hopf algebra, and we generalise in the aromatic context the Hopf embedding from multi-indices to the BCK Hopf algebra.

LGJul 16, 2025
BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training

Rui Li, Xiaoyun Zhi, Jinxin Chi et al.

Large Language Models (LLMs) have become a cornerstone of modern AI, driving breakthroughs in natural language processing and expanding into multimodal jobs involving images, audio, and video. As with most computational software, it is important to distinguish between ordinary runtime performance and startup overhead. Prior research has focused on runtime performance: improving training efficiency and stability. This work focuses instead on the increasingly critical issue of startup overhead in training: the delay before training jobs begin execution. Startup overhead is particularly important in large, industrial-scale LLMs, where failures occur more frequently and multiple teams operate in iterative update-debug cycles. In one of our training clusters, more than 3.5% of GPU time is wasted due to startup overhead alone. In this work, we present the first in-depth characterization of LLM training startup overhead based on real production data. We analyze the components of startup cost, quantify its direct impact, and examine how it scales with job size. These insights motivate the design of Bootseer, a system-level optimization framework that addresses three primary startup bottlenecks: (a) container image loading, (b) runtime dependency installation, and (c) model checkpoint resumption. To mitigate these bottlenecks, Bootseer introduces three techniques: (a) hot block record-and-prefetch, (b) dependency snapshotting, and (c) striped HDFS-FUSE. Bootseer has been deployed in a production environment and evaluated on real LLM training workloads, demonstrating a 50% reduction in startup overhead.