AIJun 16, 2022Code
Identifying Electrocardiogram Abnormalities Using a Handcrafted-Rule-Enhanced Neural NetworkYuexin Bian, Jintai Chen, Xiaojun Chen et al.
A large number of people suffer from life-threatening cardiac abnormalities, and electrocardiogram (ECG) analysis is beneficial to determining whether an individual is at risk of such abnormalities. Automatic ECG classification methods, especially the deep learning based ones, have been proposed to detect cardiac abnormalities using ECG records, showing good potential to improve clinical diagnosis and help early prevention of cardiovascular diseases. However, the predictions of the known neural networks still do not satisfactorily meet the needs of clinicians, and this phenomenon suggests that some information used in clinical diagnosis may not be well captured and utilized by these methods. In this paper, we introduce some rules into convolutional neural networks, which help present clinical knowledge to deep learning based ECG analysis, in order to improve automated ECG diagnosis performance. Specifically, we propose a Handcrafted-Rule-enhanced Neural Network (called HRNN) for ECG classification with standard 12-lead ECG input, which consists of a rule inference module and a deep learning module. Experiments on two large-scale public ECG datasets show that our new approach considerably outperforms existing state-of-the-art methods. Further, our proposed approach not only can improve the diagnosis performance, but also can assist in detecting mislabelled ECG samples. Our codes are available at https://github.com/alwaysbyx/ecg_processing.
LGJun 17, 2025Code
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain PerspectiveZhoujun Cheng, Shibo Hao, Tianyang Liu et al.
Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpus of 92K verifiable examples spanning six reasoning domains--Math, Code, Science, Logic, Simulation, and Tabular--each built through domain-specific reward design, deduplication, and filtering to ensure reliability and effectiveness for RL training. Based on Guru, we systematically revisit established findings in RL for LLM reasoning and observe significant variation across domains. For example, while prior work suggests that RL primarily elicits existing knowledge from pretrained models, our results reveal a more nuanced pattern: domains frequently seen during pretraining (Math, Code, Science) easily benefit from cross-domain RL training, while domains with limited pretraining exposure (Logic, Simulation, and Tabular) require in-domain training to achieve meaningful performance gains, suggesting that RL is likely to facilitate genuine skill acquisition. Finally, we present Guru-7B and Guru-32B, two models that achieve state-of-the-art performance among open models RL-trained with publicly available data, outperforming best baselines by 7.9% and 6.7% on our 17-task evaluation suite across six reasoning domains. We also show that our models effectively improve the Pass@k performance of their base models, particularly on complex tasks less likely to appear in pretraining data. We release data, models, training and evaluation code to facilitate general-purpose reasoning at: https://github.com/LLM360/Reasoning360
LGApr 21
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement LearningYuan Zhuang, Yuexin Bian, Sihong He et al.
Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our approach freezes randomly initialized base matrices and solely optimizes low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. Built on top of SimbaV2, we further develop a LoRA formulation, compatible with SimbaV2, that preserves its hyperspherical normalization geometry under frozen-backbone training. We evaluate our method with SAC and FastTD3 on DeepMind Control locomotion and IsaacLab robotics benchmarks. LoRA consistently achieves lower critic loss during training and stronger policy performance. Extensive experiments demonstrate that adaptive low-rank updates provide a simple, scalable, and effective structural regularization for critic learning in off-policy RL.
SYMay 18, 2024Code
PDE Control Gym: A Benchmark for Data-Driven Boundary Control of Partial Differential EquationsLuke Bhan, Yuexin Bian, Miroslav Krstic et al.
Over the last decade, data-driven methods have surged in popularity, emerging as valuable tools for control theory. As such, neural network approximations of control feedback laws, system dynamics, and even Lyapunov functions have attracted growing attention. With the ascent of learning based control, the need for accurate, fast, and easy-to-use benchmarks has increased. In this work, we present the first learning-based environment for boundary control of PDEs. In our benchmark, we introduce three foundational PDE problems - a 1D transport PDE, a 1D reaction-diffusion PDE, and a 2D Navier-Stokes PDE - whose solvers are bundled in an user-friendly reinforcement learning gym. With this gym, we then present the first set of model-free, reinforcement learning algorithms for solving this series of benchmark problems, achieving stability, although at a higher cost compared to model-based PDE backstepping. With the set of benchmark environments and detailed examples, this work significantly lowers the barrier to entry for learning-based PDE control - a topic largely unexplored by the data-driven control community. The entire benchmark is available on Github along with detailed documentation and the presented reinforcement learning models are open sourced.
LGJan 30
RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement LearningYuexin Bian, Jie Feng, Tao Wang et al.
On-policy deep reinforcement learning remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are noisy and policy updates must be conservative. In this paper, we revisit policy representation as a first-class design choice for on-policy optimization. We study discretized categorical actors that represent each action dimension with a distribution over bins, yielding a policy objective that resembles a cross-entropy loss. Building on architectural advances from supervised learning, we further propose regularized actor networks, while keeping critic design fixed. Our results show that simply replacing the standard actor network with our discretized regularized actor yields consistent gains and achieve the state-of-the-art performance across diverse continuous-control benchmarks.
SYMay 15, 2024
Improving Sequential Market Coordination via Value-oriented Renewable Energy ForecastingYufan Zhang, Honglin Wen, Yuexin Bian et al.
Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. The current deterministic clearing approach in the day-ahead (DA) market, where RESs participate based on expected production, has been criticized for causing a lack of coordination between the DA and real-time (RT) markets, leading to high overall operating costs. Previous works indicate that improving day-ahead RES entering quantities can significantly mitigate the drawbacks of deterministic clearing. In this work, we propose using a trained forecasting model, referred to as value-oriented forecasting, to determine RES Improved Entering Quantities (RIEQ) more efficiently during the operational phase. Unlike traditional models that minimize statistical forecasting errors, our approach trains model parameters to minimize the expected overall operating costs across both DA and RT markets. We derive the exact form of the loss function used for training, which becomes piecewise linear when market clearing is modeled by linear programs. Additionally, we provide the analytical gradient of the loss function with respect to the forecast, enabling an efficient training strategy. Numerical studies demonstrate that our forecasts significantly reduce overall operating costs for deterministic market clearing compared to conventional forecasts based on expected RES production.
CLSep 30, 2025
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA ExpertsYuan Zhuang, Yi Shen, Yuexin Bian et al.
Recent studies have shown that combining parameter-efficient fine-tuning (PEFT) with mixture-of-experts (MoE) is an effective strategy for adapting large language models (LLMs) to the downstream tasks. However, most existing approaches rely on conventional TopK routing, which requires careful hyperparameter tuning and assigns a fixed number of experts to each token. In this work, we propose LD-MoLE, a Learnable Dynamic routing mechanism for Mixture of LoRA Experts that enables adaptive, token-dependent, and layer-wise expert allocation. Our method replaces the non-differentiable TopK selection with a differentiable routing function and a closed-form solution. Moreover, our design allows the model to adaptively determine the number of experts to activate for each token at different layers. In addition, we introduce an analytical sparsity control objective to regularize the number of activated experts. Extensive experiments on the Qwen3-1.7B and Llama-3.2-3B models show that LD-MoLE achieves the highest average scores compared to state-of-the-art baselines, across a diverse set of benchmarks. Our method not only achieves superior performance, but also demonstrates the ability to learn token-dependent and layer-wise expert allocation.
SYApr 30, 2025
Operator learning for energy-efficient building ventilation control with computational fluid dynamics simulation of a real-world classroomYuexin Bian, Oliver Schmidt, Yuanyuan Shi
Energy-efficient ventilation control plays a vital role in reducing building energy consumption while ensuring occupant health and comfort. While Computational Fluid Dynamics (CFD) simulations provide detailed and physically accurate representation of indoor airflow, their high computational cost limits their use in real-time building control. In this work, we present a neural operator learning framework that combines the physical accuracy of CFD with the computational efficiency of machine learning to enable building ventilation control with the high-fidelity fluid dynamics models. Our method jointly optimizes the airflow supply rates and vent angles to reduce energy use and adhere to air quality constraints. We train an ensemble of neural operator transformer models to learn the mapping from building control actions to airflow fields using high-resolution CFD data. This learned neural operator is then embedded in an optimization-based control framework for building ventilation control. Experimental results show that our approach achieves significant energy savings compared to maximum airflow rate control, rule-based control, as well as data-driven control methods using spatially averaged CO2 prediction and deep learning based reduced order model, while consistently maintaining safe indoor air quality. These results highlight the practicality and scalability of our method in maintaining energy efficiency and indoor air quality in real-world buildings.