Zhe Ju

CLDec 2, 2025

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-AI, Aixin Liu, Aoxue Mei et al.

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.

20.9DSApr 12

Edge-Tilting Field Dynamics: Rapid Mixing at the Uniqueness Threshold and Optimal Mixing for Swendsen-Wang Dynamics

Xiaoyu Chen, Zhe Ju, Tianshun Miao et al.

We prove two results on the mixing times of Markov chains for two-spin systems. First, we show that the Glauber dynamics mixes in polynomial time for the Gibbs distributions of antiferromagnetic two-spin systems at the critical threshold of the uniqueness phase transition of the Gibbs measure on infinite regular trees. This completes the computational phase transition picture for antiferromagnetic two-spin systems, which includes near-linear-time optimal mixing in the uniqueness regime [Chen--Liu--Vigoda, STOC '21; Chen--Feng--Yin--Zhang, FOCS '22], NP-hardness of approximate sampling in the non-uniqueness regime [Sly--Sun, FOCS '12], and polynomial-time mixing at criticality (this work). Second, we prove an optimal $O(\log n)$ mixing time bound as well as an optimal $Ω(1)$ spectral gap for the Swendsen--Wang dynamics for the ferromagnetic Ising model with an external field on bounded-degree graphs. To the best of our knowledge, these are the first sharp bounds on the mixing rate of this classical global Markov chain beyond mean-field or strong spatial mixing (SSM) regimes, and resolve a conjecture of [Feng--Guo--Wang, IANDC '23]. A key ingredient in both proofs is a new family of localization schemes that extends the field dynamics of [Chen--Feng--Yin--Zhang, FOCS '21] by tilting general edge (or hyperedge) weights rather than vertex fields. This framework, which subsumes the classical Swendsen--Wang dynamics as a special case, extends the localization framework of [Chen--Eldan, FOCS '22] beyond stochastic and field localizations, and enables controlled tilting of interaction strengths while preserving external fields.

Zhe Ju

2 Papers