CLApr 22, 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language ModelsShi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.
Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2.5 Pro, achieves only 36.9% accuracy compared to human experts' 61.9%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https://www.phybench.cn/.
QUANT-PHMay 9, 2025
Scalable Quantum State Preparation via Large-Language-Model-Driven DiscoveryQing-Hong Cao, Zong-Yue Hou, Ying-Ying Li et al.
Efficient quantum state preparation remains a central challenge in first-principles quantum simulations of dynamics in quantum field theories, where the Hilbert space is intrinsically infinite-dimensional. Here, we introduce a large language model (LLM)-assisted framework for quantum-circuit design that systematically scales state-preparation circuits to large lattice volumes. Applied to a 1+1d XY spin chain, the LLM autonomously discovers a compact 4-parameter circuit that captures boundary-induced symmetry breaking with sub-percent energy deviation, enabling successful validation on the \texttt{Zuchongzhi} quantum processor. Guided by this insight, we extend the framework to 2+1d quantum field theories, where scalable variational ansätze have remained elusive. For a scalar field theory, the search yields a symmetry-preserving, 3-parameter shallow-depth ansatz whose optimized parameters converge to size-independent constants for lattices $n \ge 4$, providing, to our knowledge, the first scalable ansatz for this class of 2+1d models. Our results establish a practical route toward AI-assisted, human-guided discovery in quantum simulation.
COMP-PHOct 9, 2025
Iterated Agent for Symbolic RegressionZhuo-Yang Song, Zeyu Cai, Shutao Zhang et al.
Symbolic regression (SR), the automated discovery of mathematical expressions from data, is a cornerstone of scientific inquiry. However, it is often hindered by the combinatorial explosion of the search space and a tendency to overfit. Popular methods, rooted in genetic programming, explore this space syntactically, often yielding overly complex, uninterpretable models. This paper introduces IdeaSearchFitter, a framework that employs Large Language Models (LLMs) as semantic operators within an evolutionary search. By generating candidate expressions guided by natural-language rationales, our method biases discovery towards models that are not only accurate but also conceptually coherent and interpretable. We demonstrate IdeaSearchFitter's efficacy across diverse challenges: it achieves competitive, noise-robust performance on the Feynman Symbolic Regression Database (FSReD), outperforming several strong baselines; discovers mechanistically aligned models with good accuracy-complexity trade-offs on real-world data; and derives compact, physically-motivated parametrizations for Parton Distribution Functions in a frontier high-energy physics application. IdeaSearchFitter is a specialized module within our broader iterated agent framework, IdeaSearch, which is publicly available at https://www.ideasearch.cn/.
LGSep 17, 2025
RF-LSCM: Pushing Radiance Fields to Multi-Domain Localized Statistical Channel Modeling for Cellular Network OptimizationBingsheng Peng, Shutao Zhang, Xi Zheng et al.
Accurate localized wireless channel modeling is a cornerstone of cellular network optimization, enabling reliable prediction of network performance during parameter tuning. Localized statistical channel modeling (LSCM) is the state-of-the-art channel modeling framework tailored for cellular network optimization. However, traditional LSCM methods, which infer the channel's Angular Power Spectrum (APS) from Reference Signal Received Power (RSRP) measurements, suffer from critical limitations: they are typically confined to single-cell, single-grid and single-carrier frequency analysis and fail to capture complex cross-domain interactions. To overcome these challenges, we propose RF-LSCM, a novel framework that models the channel APS by jointly representing large-scale signal attenuation and multipath components within a radiance field. RF-LSCM introduces a multi-domain LSCM formulation with a physics-informed frequency-dependent Attenuation Model (FDAM) to facilitate the cross frequency generalization as well as a point-cloud-aided environment enhanced method to enable multi-cell and multi-grid channel modeling. Furthermore, to address the computational inefficiency of typical neural radiance fields, RF-LSCM leverages a low-rank tensor representation, complemented by a novel Hierarchical Tensor Angular Modeling (HiTAM) algorithm. This efficient design significantly reduces GPU memory requirements and training time while preserving fine-grained accuracy. Extensive experiments on real-world multi-cell datasets demonstrate that RF-LSCM significantly outperforms state-of-the-art methods, achieving up to a 30% reduction in mean absolute error (MAE) for coverage prediction and a 22% MAE improvement by effectively fusing multi-frequency data.
SPSep 16, 2025
A Measurement Report Data-Driven Framework for Localized Statistical Channel ModelingXinyu Qin, Ye Xue, Qi Yan et al.
Localized statistical channel modeling (LSCM) is crucial for effective performance evaluation in digital twin-assisted network optimization. Solely relying on the multi-beam reference signal receiving power (RSRP), LSCM aims to model the localized statistical propagation environment by estimating the channel angular power spectrum (APS). However, existing methods rely heavily on drive test data with high collection costs and limited spatial coverage. In this paper, we propose a measurement report (MR) data-driven framework for LSCM, exploiting the low-cost and extensive collection of MR data. The framework comprises two novel modules. The MR localization module addresses the issue of missing locations in MR data by introducing a semi-supervised method based on hypergraph neural networks, which exploits multi-modal information via distance-aware hypergraph modeling and hypergraph convolution for location extraction. To enhance the computational efficiency and solution robustness, LSCM operates at the grid level. Compared to independently constructing geographically uniform grids and estimating channel APS, the joint grid construction and channel APS estimation module enhances robustness in complex environments with spatially non-uniform data by exploiting their correlation. This module alternately optimizes grid partitioning and APS estimation using clustering and improved sparse recovery for the ill-conditioned measurement matrix and incomplete observations. Through comprehensive experiments on a real-world MR dataset, we demonstrate the superior performance and robustness of our framework in localization and channel modeling.
LOSep 18, 2019
On the Strong Equivalences of LPMLN ProgramsBin Wang, Jun Shen, Shutao Zhang et al.
By incorporating the methods of Answer Set Programming (ASP) and Markov Logic Networks (MLN), LPMLN becomes a powerful tool for non-monotonic, inconsistent and uncertain knowledge representation and reasoning. To facilitate the applications and extend the understandings of LPMLN, we investigate the strong equivalences between LPMLN programs in this paper, which is regarded as an important property in the field of logic programming. In the field of ASP, two programs P and Q are strongly equivalent, iff for any ASP program R, the programs P and Q extended by R have the same stable models. In other words, an ASP program can be replaced by one of its strong equivalent without considering its context, which helps us to simplify logic programs, enhance inference engines, construct human-friendly knowledge bases etc. Since LPMLN is a combination of ASP and MLN, the notions of strong equivalences in LPMLN is quite different from that in ASP. Firstly, we present the notions of p-strong and w-strong equivalences between LPMLN programs. Secondly, we present a characterization of the notions by generalizing the SE-model approach in ASP. Finally, we show the use of strong equivalences in simplifying LPMLN programs, and present a sufficient and necessary syntactic condition that guarantees the strong equivalence between a single LPMLN rule and the empty program.