Qianfan Zhang

2papers

2 Papers

29.2GTMay 21

Single-Item Auctions with a Monopolist Intermediary

Jingyi Liu, Aviad Rubinstein, Ertem Nusret Tas et al.

Classical optimal auction theory assumes that bids reach the seller directly. We study how this picture changes when a revenue-maximizing intermediary controls access to the seller's auction. Motivated by blockchain auctions, online platforms, and other intermediated markets, we consider a single-item auction with independent private values and a monopolist intermediary who can decide which bidder messages are forwarded to the seller. We establish approximation guarantees and impossibility results across three timing models: seller-first, intermediary-first, and simultaneous. In the seller-first model, arbitrary deterministic seller mechanisms collapse to posted-price mechanisms, and the intermediary's best response is a shifted Myerson auction. This yields a sharp separation: for regular distributions, the seller's revenue can be arbitrarily small relative to the no-intermediary optimum, while for $α$-strongly regular distributions, posted prices recover a constant fraction of the optimum with a tight dependence on $α$. We further show that timing matters: neither Stackelberg order uniformly dominates, and simultaneous play can leave both parties unboundedly worse off than in either sequential model.

32.0CLApr 1

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Qianfan Zhang, Tianyu Guo, Xuandi Ren et al.

We study how to scale reasoning token budgets for competitive programming through two complementary approaches: training-time reinforcement learning (RL) and test-time parallel thinking. During RL training, we observe an approximately log-linear relationship between validation accuracy and the average number of generated reasoning tokens over successive checkpoints, and show two ways to shift this training trajectory: verification RL warmup raises the starting point, while randomized clipping produces a steeper trend in the observed regime. As scaling single-generation reasoning during RL quickly becomes expensive under full attention, we introduce a multi-round parallel thinking pipeline that distributes the token budget across threads and rounds of generation, verification, and refinement. We train the model end-to-end on this pipeline to match the training objective to the test-time structure. Starting from Seed-OSS-36B, the full system with 16 threads and 16 rounds per thread matches the underlying RL model's oracle pass@16 at pass@1 using 7.6 million tokens per problem on average, and surpasses GPT-5-high on 456 hard competitive programming problems from AetherCode.