Jiacheng Qian

h-index2
2papers

2 Papers

AIMay 13, 2025Code
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

Xiaoyang Chen, Xinan Dai, Yu Du et al.

To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as evidenced by benchmarks on elementary to undergraduate-level mathematical tasks, the creative capabilities of these models have received comparatively little attention, and evaluation datasets remain scarce. To address this gap, we propose an evaluation criteria for mathematical creativity and introduce DeepMath-Creative, a novel, high-quality benchmark comprising constructive problems across algebra, geometry, analysis, and other domains. We conduct a systematic evaluation of mainstream LLMs' creative problem-solving abilities using this dataset. Experimental results show that even under lenient scoring criteria -- emphasizing core solution components and disregarding minor inaccuracies, such as small logical gaps, incomplete justifications, or redundant explanations -- the best-performing model, O3 Mini, achieves merely 70% accuracy, primarily on basic undergraduate-level constructive tasks. Performance declines sharply on more complex problems, with models failing to provide substantive strategies for open problems. These findings suggest that, although current LLMs display a degree of constructive proficiency on familiar and lower-difficulty problems, such performance is likely attributable to the recombination of memorized patterns rather than authentic creative insight or novel synthesis.

33.3NIMar 19
RUBICONe: Wireless RAFT-Unified Behaviors for Intervehicular Cooperative Operations and Negotiations

Zhenghua Hu, Tairan Dan, Zeyu Tao et al.

Just as Caesar declared "alea iacta est" (the die is cast) upon crossing the Rubicone river, lane change decisions in autonomous vehicles also represent critical points of no return. RUBICONe addresses this challenge by recognizing that lane change decision-making relying solely on a single vehicle's perception would be as precarious as crossing an unknown river alone. By implementing a distributed consensus framework that extends the RAFT algorithm with wireless connectivity, RUBICONe enables multiple vehicles to collectively process and aggregate their perceptions. Using multiple software-defined radio (SDR) devices as the experimental platform, this study demonstrates how consensus-based decision-making significantly reduces the impact of environmental interference and mitigates the risk of misjudgments by individual vehicles. Just as crossing the Rubicone marked a point of irrevocable action backed by collective intelligence, RUBICONe ensures that lane change decisions are made with comprehensive situational awareness and distributed consensus, showcasing the reliability gain of consensus in wireless communications.