Chengyao Wen

h-index10
2papers

2 Papers

CLOct 21, 2025Code
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Ling Team, Anqi Shen, Baihui Li et al.

We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training. Ring-1T delivers breakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 on HMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-1. Notably, it attains a silver medal-level result on the IMO-2025, underscoring its exceptional reasoning capabilities. By releasing the complete 1T parameter MoE model to the community, we provide the research community with direct access to cutting-edge reasoning capabilities. This contribution marks a significant milestone in democratizing large-scale reasoning intelligence and establishes a new baseline for open-source model performance.

LGNov 2, 2023
On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Chengyao Wen, Yin Lou

Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work.