Qian Ma

h-index44

5papers

96citations

Novelty62%

AI Score43

Ranked #56,075 of 194,257 authors (top 29%)#12,761 in LG (top 32%)

5 Papers

32.1LGSep 2, 2025Code

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Baichuan-M2 Team, Chengfeng Dou, Chong Liu et al.

As large language models (LLMs) advance in conversational and reasoning capabilities, their practical application in healthcare has become a critical research focus. However, there is a notable gap between the performance of medical LLMs on static benchmarks such as USMLE and their utility in real-world clinical decision-making. This discrepancy arises because traditional exams fail to capture the dynamic, interactive nature of medical consultations. To address this challenge, we introduce a novel dynamic verification framework that moves beyond static answer verifier, establishing a large-scale, high-fidelity interactive reinforcement learning system. Our framework comprises two key components: a Patient Simulator that creates realistic clinical environments using de-identified medical records, and a Clinical Rubrics Generator that dynamically produces multi-dimensional evaluation metrics. Building on this foundation, we develop Baichuan-M2, a 32B-parameter medical augmented reasoning model trained through a multi-stage reinforcement learning strategy with an improved Group Relative Policy Optimization (GRPO) algorithm. Evaluated on HealthBench, Baichuan-M2 outperforms all other open-source models and most advanced closed-source counterparts, achieving a score above 32 on the challenging HealthBench Hard benchmark-previously exceeded only by GPT-5. Our work demonstrates that robust dynamic verifier system is essential for aligning LLM capabilities with practical clinical applications, establishing a new Pareto front in the performance-parameter trade-off for medical AI deployment.

6.4CRApr 30, 2025

How to Backdoor the Knowledge Distillation

Chen Wu, Qian Ma, Prasenjit Mitra et al.

Knowledge distillation has become a cornerstone in modern machine learning systems, celebrated for its ability to transfer knowledge from a large, complex teacher model to a more efficient student model. Traditionally, this process is regarded as secure, assuming the teacher model is clean. This belief stems from conventional backdoor attacks relying on poisoned training data with backdoor triggers and attacker-chosen labels, which are not involved in the distillation process. Instead, knowledge distillation uses the outputs of a clean teacher model to guide the student model, inherently preventing recognition or response to backdoor triggers as intended by an attacker. In this paper, we challenge this assumption by introducing a novel attack methodology that strategically poisons the distillation dataset with adversarial examples embedded with backdoor triggers. This technique allows for the stealthy compromise of the student model while maintaining the integrity of the teacher model. Our innovative approach represents the first successful exploitation of vulnerabilities within the knowledge distillation process using clean teacher models. Through extensive experiments conducted across various datasets and attack settings, we demonstrate the robustness, stealthiness, and effectiveness of our method. Our findings reveal previously unrecognized vulnerabilities and pave the way for future research aimed at securing knowledge distillation processes against backdoor attacks.

3.3AIAug 16, 2025

Modeling Relational Logic Circuits for And-Inverter Graph Convolutional Network

Weihao Sun, Shikai Guo, Siwen Wang et al.

The automation of logic circuit design enhances chip performance, energy efficiency, and reliability, and is widely applied in the field of Electronic Design Automation (EDA).And-Inverter Graphs (AIGs) efficiently represent, optimize, and verify the functional characteristics of digital circuits, enhancing the efficiency of EDA development.Due to the complex structure and large scale of nodes in real-world AIGs, accurate modeling is challenging, leading to existing work lacking the ability to jointly model functional and structural characteristics, as well as insufficient dynamic information propagation capability.To address the aforementioned challenges, we propose AIGer.Specifically, AIGer consists of two components: 1) Node logic feature initialization embedding component and 2) AIGs feature learning network component.The node logic feature initialization embedding component projects logic nodes, such as AND and NOT, into independent semantic spaces, to enable effective node embedding for subsequent processing.Building upon this, the AIGs feature learning network component employs a heterogeneous graph convolutional network, designing dynamic relationship weight matrices and differentiated information aggregation approaches to better represent the original structure and information of AIGs.The combination of these two components enhances AIGer's ability to jointly model functional and structural characteristics and improves its message passing capability. Experimental results indicate that AIGer outperforms the current best models in the Signal Probability Prediction (SSP) task, improving MAE and MSE by 18.95\% and 44.44\%, respectively. In the Truth Table Distance Prediction (TTDP) task, AIGer achieves improvements of 33.57\% and 14.79\% in MAE and MSE, respectively, compared to the best-performing models.

12.5LGJun 22, 2021

Enabling Long-Term Cooperation in Cross-Silo Federated Learning: A Repeated Game Perspective

Ning Zhang, Qian Ma, Xu Chen

Cross-silo federated learning (FL) is a distributed learning approach where clients of the same interest train a global model cooperatively while keeping their local data private. The success of a cross-silo FL process requires active participation of many clients. Clients in cross-silo FL aim to optimize their long-term benefits by selfishly choosing their participation levels. While there has been some work on incentivizing clients to join FL, the analysis of clients' long-term selfish participation behaviors in cross-silo FL remains largely unexplored. In this paper, we analyze the selfish participation behaviors of heterogeneous clients in cross-silo FL. Specifically, we model clients' long-term selfish participation behaviors as an infinitely repeated game. For the stage game SPFL, we derive the unique Nash equilibrium (NE), and propose a distributed algorithm for each client to calculate its equilibrium participation strategy. We show that at the NE, clients fall into at most three categories: (i) free riders, (ii) a unique partial contributor (if exists), and (iii) contributors. For the long-term interactions among clients, we derive a cooperative strategy for clients which minimizes the number of free riders while increasing the amount of local data for model training. We show that enforced by a punishment strategy, such a cooperative strategy is a subgame perfect Nash equilibrium (SPNE) of the infinitely repeated game, under which some clients who are free riders at the NE of the stage game choose to be (partial) contributors. We further propose an algorithm to calculate the optimal SPNE which minimizes the number of free riders while maximizing the amount of local data for model training. Simulation results show that our derived optimal SPNE can effectively reduce the number of free riders by up to 99.3% and increase the amount of local data for model training by up to 82.3%.

1.2SYJun 10, 2021

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Fanlin Meng, Qian Ma, Zixu Liu et al.

In this paper, we propose a realistic multiple dynamic pricing approach to demand response in the retail market. First, an adaptive clustering-based customer segmentation framework is proposed to categorize customers into different groups to enable the effective identification of usage patterns. Second, customized demand models with important market constraints which capture the price-demand relationship explicitly, are developed for each group of customers to improve the model accuracy and enable meaningful pricing. Third, the multiple pricing based demand response is formulated as a profit maximization problem subject to realistic market constraints. The overall aim of the proposed scalable and practical method aims to achieve 'right' prices for 'right' customers so as to benefit various stakeholders in the system such as grid operators, customers and retailers. The proposed multiple pricing framework is evaluated via simulations based on real-world datasets.