Xuandi Ren

32.0CLApr 1

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Qianfan Zhang, Tianyu Guo, Xuandi Ren et al.

We study how to scale reasoning token budgets for competitive programming through two complementary approaches: training-time reinforcement learning (RL) and test-time parallel thinking. During RL training, we observe an approximately log-linear relationship between validation accuracy and the average number of generated reasoning tokens over successive checkpoints, and show two ways to shift this training trajectory: verification RL warmup raises the starting point, while randomized clipping produces a steeper trend in the observed regime. As scaling single-generation reasoning during RL quickly becomes expensive under full attention, we introduce a multi-round parallel thinking pipeline that distributes the token budget across threads and rounds of generation, verification, and refinement. We train the model end-to-end on this pipeline to match the training objective to the test-time structure. Starting from Seed-OSS-36B, the full system with 16 threads and 16 rounds per thread matches the underlying RL model's oracle pass@16 at pass@1 using 7.6 million tokens per problem on average, and surpasses GPT-5-high on 456 hard competitive programming problems from AetherCode.

10.7CCMay 12

Strong Inapproximability for a Promise Rank Problem

Venkatesan Guruswami, Xuandi Ren, Shaoxuan Tang

Given a linear subspace of $n \times n$ matrices over $\mathbb F_{2^r}$ that is promised to contain a matrix of rank $1$, we prove that it is hard to find a matrix of rank $n^{o(1/\log \log n)}$, assuming NP doesn't have sub-exponential algorithms. In addition to being a basic problem, the hardness of this problem, even for the exact version, drove recent PCP-free inapproximability results for minimum distance and shortest vector problems concerning codes and lattices. The proof combines the concept of superposition soundness introduced by Khot and Saket with moment matrices. To produce a rank-gap of $1$ vs. $k$, the reduction runs in time $n^{O(\log k)}$. We also give another moment-matrix-based construction which runs in time $n^{O(k)}$ but works for any finite field $\mathbb F_q$.

Xuandi Ren

2 Papers