CLApr 10, 2025
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement LearningByteDance Seed, Jiaze Chen, Tiantian Fan et al. · bytedance
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.
56.5DCMay 26
A Formal Semantics of C with OpenMP ParallelismKe Du, Anshu Sharma, Liyi Li et al.
OpenMP is a popular parallelization framework that lets users transform sequential code into parallel code with a few simple annotations. Unfortunately, it is also easy to inadvertently introduce errors by adding OpenMP pragmas into otherwise correct programs, including both logic errors and race conditions. We present a formal semantics for C code with OpenMP directives, building on the C semantics of the CompCert verified compiler and its extension to concurrency. Our semantics captures subtle interactions between OpenMP directives and variable state that have been obscured by previous OpenMP semantics, and provides a basis for detecting undesired behaviors introduced by incorrect annotations: in particular, any successful execution is guaranteed to be free of data races.
38.7QUANT-PHApr 3
DisQ: A Model of Distributed Quantum Processors (Extended Version)Le Chang, Saitej Yavvari, Rance Cleaveland et al.
The next generation of distributed quantum processors combines single-location quantum computing and quantum networking techniques to permit large entangled qubit groups to be established through remote processors, and quantum algorithms can be executed distributively. We present DisQ, as the first formal model of distributed quantum processors, and permit the analysis of distributed quantum programs in the new computation environment. The core of DisQ is a distributed quantum programming language that combines the concepts of Chemical Abstract Machine (CHAM) and Markov Decision Processes (MDP) with the objective of providing clearly distinguishing quantum concurrent and distributed behaviors. Based on the DisQ language, we develop a simulation relation, based on classical simulation infrastructure, to check the equivalence of a quantum algorithm and its distributed versions so that users can develop the distributed version of a sequential quantum program via a simulation check.
PLJan 31, 2022
A Formal Model of Checked CLiyi Li, Yiyun Liu, Deena L. Postol et al.
We present a formal model of Checked C, a dialect of C that aims to enforce spatial memory safety. Our model pays particular attention to the semantics of dynamically sized, potentially null-terminated arrays. We formalize this model in Coq, and prove that any spatial memory safety errors can be blamed on portions of the program labeled unchecked; this is a Checked C feature that supports incremental porting and backward compatibility. While our model's operational semantics uses annotated ("fat") pointers to enforce spatial safety, we show that such annotations can be safely erased: Using PLT Redex we formalize an executable version of our model and a compilation procedure from it to an untyped C-like language, and use randomized testing to validate that generated code faithfully simulates the original. Finally, we develop a custom random generator for well-typed and almost-well-typed terms in our Redex model, and use it to search for inconsistencies between our model and the Clang Checked C implementation. We find these steps to be a useful way to co-develop a language (Checked C is still in development) and a core model of it.