LGAug 25, 2025Code
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter PhysicsWeida Wang, Dongchen Huang, Jiatong Li et al.
We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated systems, etc. To ensure a deep understanding of the problem-solving process,we focus exclusively on calculation problems, requiring LLMs to independently generate comprehensive solutions. Meanwhile, leveraging tree-based representations of expressions, we introduce the Scalable Expression Edit Distance (SEED) score, which provides fine-grained (non-binary) partial credit and yields a more accurate assessment of similarity between prediction and ground-truth. Our results show that even the best models, Grok-4, reach only 36 average SEED score and 28% accuracy on CMPhysBench, underscoring a significant capability gap, especially for this practical and frontier domain relative to traditional physics. The code anddataset are publicly available at https://github.com/CMPhysBench/CMPhysBench.
35.5CRApr 4
ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split LearningYuhan Shui, Ruobin Jin, Zhihao Dou et al.
Vertical split learning (SL) enables collaborative model training across parties holding complementary features without sharing raw data, but recent work has shown that it is highly vulnerable to poisoning-based backdoor attacks operating on intermediate embeddings. By compromising malicious clients, adversaries can inject stealthy triggers that manipulate the server-side model while remaining difficult to detect, and existing defenses provide limited robustness against adaptive attacks. In this paper, we propose ProtoGuard-SL, a server-side defense that improves the robustness of split learning by exploiting class-conditional representation consistency in the embedding space. Our approach is motivated by the observation that benign embeddings within the same class exhibit stable semantic alignment, whereas poisoned embeddings inevitably disrupt this structure. ProtoGuard-SL adopts a two-stage framework that constructs robust class prototypes and transforms embeddings into a prototype-consistency representation, followed by a class-conditional, distribution-free conformal filtering strategy to identify and remove anomalous embeddings. Extensive experiments are conducted on three datasets, CIFAR-10, SVHN, and Bank Marketing, under three different attack settings demonstrate that our method achieves state-of-the-art performance.