Weidong Huang

LG
h-index23
14papers
578citations
Novelty31%
AI Score47

14 Papers

AIOct 19, 2023
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Jiaming Ji, Borong Zhang, Jiayi Zhou et al. · pku

Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium.

LGJul 14, 2023Code
SafeDreamer: Safe Reinforcement Learning with World Models

Weidong Huang, Jiaming Ji, Chunhe Xia et al.

The deployment of Reinforcement Learning (RL) in real-world applications is constrained by its failure to satisfy safety criteria. Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks. These limitations are primarily due to model inaccuracies and inadequate sample efficiency. The integration of the world model has proven effective in mitigating these shortcomings. In this work, we introduce SafeDreamer, a novel algorithm incorporating Lagrangian-based methods into world model planning processes within the superior Dreamer framework. Our method achieves nearly zero-cost performance on various tasks, spanning low-dimensional and vision-only input, within the Safety-Gymnasium benchmark, showcasing its efficacy in balancing performance and safety in RL tasks. Further details can be found in the code repository: \url{https://github.com/PKU-Alignment/SafeDreamer}.

CVJan 1
Towards Automated Differential Diagnosis of Skin Diseases Using Deep Learning and Imbalance-Aware Strategies

Ali Anaissi, Ali Braytee, Weidong Huang et al.

As dermatological conditions become increasingly common and the availability of dermatologists remains limited, there is a growing need for intelligent tools to support both patients and clinicians in the timely and accurate diagnosis of skin diseases. In this project, we developed a deep learning based model for the classification and diagnosis of skin conditions. By leveraging pretraining on publicly available skin disease image datasets, our model effectively extracted visual features and accurately classified various dermatological cases. Throughout the project, we refined the model architecture, optimized data preprocessing workflows, and applied targeted data augmentation techniques to improve overall performance. The final model, based on the Swin Transformer, achieved a prediction accuracy of 87.71 percent across eight skin lesion classes on the ISIC2019 dataset. These results demonstrate the model's potential as a diagnostic support tool for clinicians and a self assessment aid for patients.

QMJan 1
Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics

Ali Anaissi, Seid Miad Zandavi, Weidong Huang et al.

Single-cell data analysis has the potential to revolutionize personalized medicine by characterizing disease-associated molecular changes at the single-cell level. Advanced single-cell multimodal assays can now simultaneously measure various molecules (e.g., DNA, RNA, Protein) across hundreds of thousands of individual cells, providing a comprehensive molecular readout. A significant analytical challenge is integrating single-cell measurements across different modalities. Various methods have been developed to address this challenge, but there has been no systematic evaluation of these techniques with different preprocessing strategies. This study examines a general pipeline for single-cell data analysis, which includes normalization, data integration, and dimensionality reduction. The performance of different algorithm combinations often depends on the dataset sizes and characteristics. We evaluate six datasets across diverse modalities, tissues, and organisms using three metrics: Silhouette Coefficient Score, Adjusted Rand Index, and Calinski-Harabasz Index. Our experiments involve combinations of seven normalization methods, four dimensional reduction methods, and five integration methods. The results show that Seurat and Harmony excel in data integration, with Harmony being more time-efficient, especially for large datasets. UMAP is the most compatible dimensionality reduction method with the integration techniques, and the choice of normalization method varies depending on the integration method used.

MESep 16, 2025Code
Augmenting Limited and Biased RCTs through Pseudo-Sample Matching-Based Observational Data Fusion Method

Kairong Han, Weidong Huang, Taiyang Zhou et al.

In the online ride-hailing pricing context, companies often conduct randomized controlled trials (RCTs) and utilize uplift models to assess the effect of discounts on customer orders, which substantially influences competitive market outcomes. However, due to the high cost of RCTs, the proportion of trial data relative to observational data is small, which only accounts for 0.65\% of total traffic in our context, resulting in significant bias when generalizing to the broader user base. Additionally, the complexity of industrial processes reduces the quality of RCT data, which is often subject to heterogeneity from potential interference and selection bias, making it difficult to correct. Moreover, existing data fusion methods are challenging to implement effectively in complex industrial settings due to the high dimensionality of features and the strict assumptions that are hard to verify with real-world data. To address these issues, we propose an empirical data fusion method called pseudo-sample matching. By generating pseudo-samples from biased, low-quality RCT data and matching them with the most similar samples from large-scale observational data, the method expands the RCT dataset while mitigating its heterogeneity. We validated the method through simulation experiments, conducted offline and online tests using real-world data. In a week-long online experiment, we achieved a 0.41\% improvement in profit, which is a considerable gain when scaled to industrial scenarios with hundreds of millions in revenue. In addition, we discuss the harm to model training, offline evaluation, and online economic benefits when the RCT data quality is not high, and emphasize the importance of improving RCT data quality in industrial scenarios. Further details of the simulation experiments can be found in the GitHub repository https://github.com/Kairong-Han/Pseudo-Matching.

LGMay 16, 2023Code
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research

Jiaming Ji, Jiayi Zhou, Borong Zhang et al.

AI systems empowered by reinforcement learning (RL) algorithms harbor the immense potential to catalyze societal advancement, yet their deployment is often impeded by significant safety concerns. Particularly in safety-critical applications, researchers have raised concerns about unintended harms or unsafe behaviors of unaligned RL agents. The philosophy of safe reinforcement learning (SafeRL) is to align RL agents with harmless intentions and safe behavioral patterns. In SafeRL, agents learn to develop optimal policies by receiving feedback from the environment, while also fulfilling the requirement of minimizing the risk of unintended harm or unsafe behavior. However, due to the intricate nature of SafeRL algorithm implementation, combining methodologies across various domains presents a formidable challenge. This had led to an absence of a cohesive and efficacious learning framework within the contemporary SafeRL research milieu. In this work, we introduce a foundational framework designed to expedite SafeRL research endeavors. Our comprehensive framework encompasses an array of algorithms spanning different RL domains and places heavy emphasis on safety elements. Our efforts are to make the SafeRL-related research process more streamlined and efficient, therefore facilitating further research in AI safety. Our project is released at: https://github.com/PKU-Alignment/omnisafe.

LGMar 3, 2025
Differentiable Information Enhanced Model-Based Reinforcement Learning

Xiaoyuan Zhang, Xinyan Cai, Bo Liu et al. · pku

Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information that facilitates gradient-based methods. In comparison to prevailing model-free reinforcement learning approaches, model-based reinforcement learning (MBRL) methods exhibit the potential to effectively harness the power of differentiable information for recovering the underlying physical dynamics. However, this presents two primary challenges: effectively utilizing differentiable information to 1) construct models with more accurate dynamic prediction and 2) enhance the stability of policy training. In this paper, we propose a Differentiable Information Enhanced MBRL method, MB-MIX, to address both challenges. Firstly, we adopt a Sobolev model training approach that penalizes incorrect model gradient outputs, enhancing prediction accuracy and yielding more precise models that faithfully capture system dynamics. Secondly, we introduce mixing lengths of truncated learning windows to reduce the variance in policy gradient estimation, resulting in improved stability during policy learning. To validate the effectiveness of our approach in differentiable environments, we provide theoretical analysis and empirical results. Notably, our approach outperforms previous model-based and model-free methods, in multiple challenging tasks involving controllable rigid robots such as humanoid robots' motion control and deformable object manipulation.

LGNov 21, 2025
A Hybrid Computational Intelligence Framework for scRNA-seq Imputation: Integrating scRecover and Random Forests

Ali Anaissi, Deshao Liu, Yuanzhe Jia et al.

Single-cell RNA sequencing (scRNA-seq) enables transcriptomic profiling at cellular resolution but suffers from pervasive dropout events that obscure biological signals. We present SCR-MF, a modular two-stage workflow that combines principled dropout detection using scRecover with robust non-parametric imputation via missForest. Across public and simulated datasets, SCR-MF achieves robust and interpretable performance comparable to or exceeding existing imputation methods in most cases, while preserving biological fidelity and transparency. Runtime analysis demonstrates that SCR-MF provides a competitive balance between accuracy and computational efficiency, making it suitable for mid-scale single-cell datasets.

LGMar 20, 2025
FedSAF: A Federated Learning Framework for Enhanced Gastric Cancer Detection and Privacy Preservation

Yuxin Miao, Xinyuan Yang, Hongda Fan et al.

Gastric cancer is one of the most commonly diagnosed cancers and has a high mortality rate. Due to limited medical resources, developing machine learning models for gastric cancer recognition provides an efficient solution for medical institutions. However, such models typically require large sample sizes for training and testing, which can challenge patient privacy. Federated learning offers an effective alternative by enabling model training across multiple institutions without sharing sensitive patient data. This paper addresses the limited sample size of publicly available gastric cancer data with a modified data processing method. This paper introduces FedSAF, a novel federated learning algorithm designed to improve the performance of existing methods, particularly in non-independent and identically distributed (non-IID) data scenarios. FedSAF incorporates attention-based message passing and the Fisher Information Matrix to enhance model accuracy, while a model splitting function reduces computation and transmission costs. Hyperparameter tuning and ablation studies demonstrate the effectiveness of this new algorithm, showing improvements in test accuracy on gastric cancer datasets, with FedSAF outperforming existing federated learning methods like FedAMP, FedAvg, and FedProx. The framework's robustness and generalization ability were further validated across additional datasets (SEED, BOT, FashionMNIST, and CIFAR-10), achieving high performance in diverse environments.

LGApr 15, 2021
Facilitating Machine Learning Model Comparison and Explanation Through A Radial Visualisation

Jianlong Zhou, Weidong Huang, Fang Chen

Building an effective Machine Learning (ML) model for a data set is a difficult task involving various steps. One of the most important steps is to compare generated substantial amounts of ML models to find the optimal one for the deployment. It is challenging to compare such models with dynamic number of features. Comparison is more than just finding differences of ML model performance, users are also interested in the relations between features and model performance such as feature importance for ML explanations. This paper proposes RadialNet Chart, a novel visualisation approach to compare ML models trained with a different number of features of a given data set while revealing implicit dependent relations. In RadialNet Chart, ML models and features are represented by lines and arcs respectively. These lines are generated effectively using a recursive function. The dependence of ML models with dynamic number of features is encoded into the structure of visualisation, where ML models and their dependent features are directly revealed from related line connections. ML model performance information is encoded with colour and line width in RadialNet Chart. Together with the structure of visualisation, feature importance can be directly discerned in RadialNet Chart for ML explanations.

CVJun 21, 2018
Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Jiaxiang Wu, Weidong Huang, Junzhou Huang et al.

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.

HCSep 25, 2016
Structure Based Aesthetics and Support of Cognitive Tasks for Graph Evaluation

Weidong Huang

Drawing principles, or aesthetics, are important in graph drawing. They are used as criteria for algorithm design and for quality evaluation. Current aesthetics are described as visual properties that a drawing is required to have to be visually pleasing. However, most of these aesthetics are originally proposed without consideration of graph structure information. Therefore their ability in visually revealing graph structural features are not guaranteed and indeed mixed results have been reported in the literature regarding their impact on user graph comprehension. In this paper, we propose to derive aesthetics based on graph internal structural features. Further, graphs are often evaluated based on controlled experiments with simple perception tasks to avoid possible confounding factors caused by complex tasks. This leaves their value in supporting complex tasks unevaluated. To fill this gap, we also discuss the possibility of applying evaluation methodologies used in the Cognitive Load Theory research for graph evaluation.

HCJul 11, 2013
Designing a Network Based System for Delivery of Remote Mine Services

Craig James, Weidong Huang, Kazys Stepanas et al.

There is a great body of work in the areas of tele-assistance/tele-collaboration offering novel and effective ways to improve collaboration between personnel located at a remote mine site and off-site personnel located in major metropolitan areas. Much of this work involves the use of high-bandwidth communications or targeted sensory experiences using large format displays. There are also existing remote access technologies but these suffer from limited functionality (providing text, voice, video or one-way desktop sharing), are often poorly supported in the security-conscious corporate environment and require complicated set up processes. There is currently no singular piece of remote collaboration technology that is suitable for the delivery of high-quality planning and scheduling services to clients at a mining site from a remote operating centre. In response to this issue, as part of a research and technology development effort between CSIRO and a mining engineering firm, we have developed a concept of remote mining engineer (RME) and conducted a functional requirements analysis for delivering mining engineering services to mine sites remotely. Based on the obtained requirements, a further study was performed to characterise existing technologies and to identify the scope for future work in designing and prototyping a network based system for RME. We report on the method and findings of this study in this paper.

HCJun 11, 2013
An Aggregation-Based Overall Quality Measurement for Visualization

Weidong Huang

Aesthetics are often used to evaluate the quality of graph drawings. However, the existing aesthetic criteria are useful in judging the extents to which a drawing conforms to particular drawing rules. They have limitations in evaluating overall quality. Currently the overall quality of graph drawings is mainly evaluated based on personal judgments and user studies. Personal judgments are not reliable, while user studies can be costly to run. Therefore, there is a need for a direct measure of overall quality. This measure can be used by visualization designers to quickly compare the quality of drawings at hand at the design stage and make decisions accordingly. In an attempt to meet this need, we propose a measure that measures overall quality based on aggregation of individual aesthetic criteria. We present a user study that validates this measure and demonstrates its capacity in predicting the performance of human graph comprehension. The implications of the proposed measure for future research are discussed.