Bei Jiang

LG
h-index16
19papers
373citations
Novelty56%
AI Score54

19 Papers

LGMay 20, 2022Code
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure

Xing Chen, Dongcui Diao, Hechang Chen et al.

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using a novel surrogate objective that employs the sigmoid function (which provides an interesting way of exploration), we found that the answer is ``YES'', and the better policies are in fact located very far from the clipped space. We show that PPO is insufficient in ``off-policyness'', according to an off-policy metric called DEON. Our algorithm explores in a much larger policy space than PPO, and it maximizes the Conservative Policy Iteration (CPI) objective better than PPO during training. To the best of our knowledge, all current PPO methods have the clipping operation and optimize in the clipped policy space. Our method is the first of this kind, which advances the understanding of CPI optimization and policy gradient methods. Code is available at https://github.com/raincchio/P3O.

MLOct 5, 2022
Conformalized Fairness via Quantile Regression

Meichen Liu, Lei Ding, Dengdeng Yu et al.

Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity with respect to sensitive attributes, such as race or gender, and thereby derive a reliable fair prediction interval. Using optimal transport and functional synchronization techniques, we establish theoretical guarantees of distribution-free coverage and exact fairness for the induced prediction interval constructed by fair quantiles. A hands-on pipeline is provided to incorporate flexible quantile regressions with an efficient fairness adjustment post-processing algorithm. We demonstrate the superior empirical performance of this approach on several benchmark datasets. Our results show the model's ability to uncover the mechanism underlying the fairness-accuracy trade-off in a wide range of societal and medical applications.

LGSep 29, 2022
How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

Ke Sun, Bei Jiang, Linglong Kong

Distributional reinforcement learning, which focuses on learning the entire return distribution instead of only its expectation in standard RL, has demonstrated remarkable success in enhancing performance. Despite these advancements, our comprehension of how the return distribution within distributional RL still remains limited. In this study, we investigate the optimization advantages of distributional RL by utilizing its extra return distribution knowledge over classical RL within the Neural Fitted Z-Iteration~(Neural FZI) framework. To begin with, we demonstrate that the distribution loss of distributional RL has desirable smoothness characteristics and hence enjoys stable gradients, which is in line with its tendency to promote optimization stability. Furthermore, the acceleration effect of distributional RL is revealed by decomposing the return distribution. It shows that distributional RL can perform favorably if the return distribution approximation is appropriate, measured by the variance of gradient estimates in each environment. Rigorous experiments validate the stable optimization behaviors of distributional RL and its acceleration effects compared to classical RL. Our research findings illuminate how the return distribution in distributional RL algorithms helps the optimization.

LGOct 31, 2022
Class Interference of Deep Neural Networks

Dongcui Diao, Hengshuai Yao, Bei Jiang

Recognizing and telling similar objects apart is even hard for human beings. In this paper, we show that there is a phenomenon of class interference with all deep neural networks. Class interference represents the learning difficulty in data, and it constitutes the largest percentage of generalization errors by deep networks. To understand class interference, we propose cross-class tests, class ego directions and interference models. We show how to use these definitions to study minima flatness and class interference of a trained model. We also show how to detect class interference during training through label dancing pattern and class dancing notes.

CRNov 9, 2023
Gaussian Differential Privacy on Riemannian Manifolds

Yangdi Jiang, Xiaotian Chang, Yi Liu et al.

We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distribution that integrates the Riemannian distance, allowing us to achieve GDP in Riemannian manifolds with bounded Ricci curvature. To the best of our knowledge, this work marks the first instance of extending the GDP framework to accommodate general Riemannian manifolds, encompassing curved spaces, and circumventing the reliance on tangent space summaries. We provide a simple algorithm to evaluate the privacy budget $μ$ on any one-dimensional manifold and introduce a versatile Markov Chain Monte Carlo (MCMC)-based algorithm to calculate $μ$ on any Riemannian manifold with constant curvature. Through simulations on one of the most prevalent manifolds in statistics, the unit sphere $S^d$, we demonstrate the superior utility of our Riemannian Gaussian mechanism in comparison to the previously proposed Riemannian Laplace mechanism for implementing GDP.

MLApr 19
Differentially Private Conformal Prediction

Jiamei Wu, Ce Zhang, Zhipeng Cai et al.

Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.

LGFeb 1, 2022Code
Distributional Reinforcement Learning with Regularized Wasserstein Loss

Ke Sun, Yingnan Zhao, Wulong Liu et al.

The empirical success of distributional reinforcement learning (RL) highly relies on the choice of distribution divergence equipped with an appropriate distribution representation. In this paper, we propose \textit{Sinkhorn distributional RL (SinkhornDRL)}, which leverages Sinkhorn divergence, a regularized Wasserstein loss, to minimize the difference between current and target Bellman return distributions. Theoretically, we prove the contraction properties of SinkhornDRL, aligning with the interpolation nature of Sinkhorn divergence between Wasserstein distance and Maximum Mean Discrepancy (MMD). The introduced SinkhornDRL enriches the family of distributional RL algorithms, contributing to interpreting the algorithm behaviors compared with existing approaches by our investigation into their relationships. Empirically, we show that SinkhornDRL consistently outperforms or matches existing algorithms on the Atari games suite and particularly stands out in the multi-dimensional reward setting. \thanks{Code is available in \url{https://github.com/datake/SinkhornDistRL}.}.

MLApr 8, 2025
Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks

Enze Shi, Linglong Kong, Bei Jiang

Ensuring fairness in machine learning is a critical and challenging task, as biased data representations often lead to unfair predictions. To address this, we propose Deep Fair Learning, a framework that integrates nonlinear sufficient dimension reduction with deep learning to construct fair and informative representations. By introducing a novel penalty term during fine-tuning, our method enforces conditional independence between sensitive attributes and learned representations, addressing bias at its source while preserving predictive performance. Unlike prior methods, it supports diverse sensitive attributes, including continuous, discrete, binary, or multi-group types. Experiments on various types of data structure show that our approach achieves a superior balance between fairness and utility, significantly outperforming state-of-the-art baselines.

MLOct 27, 2025
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis

Enze Shi, Pankaj Bhagwat, Zhixian Yang et al.

Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.

LGOct 14, 2025
Budget-constrained Active Learning to Effectively De-censor Survival Data

Ali Parsaee, Bei Jiang, Zachary Friggstad et al.

Standard supervised learners attempt to learn a model from a labeled dataset. Given a small set of labeled instances, and a pool of unlabeled instances, a budgeted learner can use its given budget to pay to acquire the labels of some unlabeled instances, which it can then use to produce a model. Here, we explore budgeted learning in the context of survival datasets, which include (right) censored instances, where we know only a lower bound on an instance's time-to-event. Here, that learner can pay to (partially) label a censored instance -- e.g., to acquire the actual time for an instance [perhaps go from (3 yr, censored) to (7.2 yr, uncensored)], or other variants [e.g., learn about one more year, so go from (3 yr, censored) to either (4 yr, censored) or perhaps (3.2 yr, uncensored)]. This serves as a model of real world data collection, where follow-up with censored patients does not always lead to uncensoring, and how much information is given to the learner model during data collection is a function of the budget and the nature of the data itself. We provide both experimental and theoretical results for how to apply state-of-the-art budgeted learning algorithms to survival data and the respective limitations that exist in doing so. Our approach provides bounds and time complexity asymptotically equivalent to the standard active learning method BatchBALD. Moreover, empirical analysis on several survival tasks show that our model performs better than other potential approaches on several benchmarks.

MEJul 9, 2025
Non-Asymptotic Analysis of Online Local Private Learning with SGD

Enze Shi, Jinhan Xie, Bei Jiang et al.

Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses have focused on non-private optimization methods, and hence are not applicable to privacy-preserving optimization problems. This work initiates the analysis to bridge this gap and opens the door to non-asymptotic convergence analysis of private optimization problems. A general framework is investigated for the online LDP model in stochastic optimization problems. We assume that sensitive information from individuals is collected sequentially and aim to estimate, in real-time, a static parameter that pertains to the population of interest. Most importantly, we conduct a comprehensive non-asymptotic convergence analysis of the proposed estimators in finite-sample situations, which gives their users practical guidelines regarding the effect of various hyperparameters, such as step size, parameter dimensions, and privacy budgets, on convergence rates. Our proposed estimators are validated in the theoretical and practical realms by rigorous mathematical derivations and carefully constructed numerical experiments.

MLMar 19, 2025
Online federated learning framework for classification

Wenxing Guo, Jinhan Xie, Jianya Lu et al.

In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.

MLMar 11, 2025
A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation

Forough Fazeliasl, Michael Minyi Zhang, Bei Jiang et al.

Mutual Information (MI) is a crucial measure for capturing dependencies between variables, but exact computation is challenging in high dimensions with intractable likelihoods, impacting accuracy and robustness. One idea is to use an auxiliary neural network to train an MI estimator; however, methods based on the empirical distribution function (EDF) can introduce sharp fluctuations in the MI loss due to poor out-of-sample performance, destabilizing convergence. We present a Bayesian nonparametric (BNP) solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization in the training process. With this regularization, the MI loss integrates both prior knowledge and empirical data to reduce the loss sensitivity to fluctuations and outliers in the sample data, especially in small sample settings like mini-batches. This approach addresses the challenge of balancing accuracy and low variance by effectively reducing variance, leading to stabilized and robust MI loss gradients during training and enhancing the convergence of the MI approximation while offering stronger theoretical guarantees for convergence. We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder. Experimental results demonstrate significant improvements in convergence over EDF-based methods, with applications across synthetic and real datasets, notably in 3D CT image generation, yielding enhanced structure discovery and reduced overfitting in data synthesis. While this paper focuses on generative models in application, the proposed estimator is not restricted to this setting and can be applied more broadly in various BNP learning procedures.

CLDec 9, 2021
Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

Lei Ding, Dengdeng Yu, Jinhan Xie et al.

With widening deployments of natural language processing (NLP) in daily life, inherited social biases from NLP models have become more severe and problematic. Previous studies have shown that word embeddings trained on human-generated corpora have strong gender biases that can produce discriminative results in downstream tasks. Previous debiasing methods focus mainly on modeling bias and only implicitly consider semantic information while completely overlooking the complex underlying causal structure among bias and semantic components. To address these issues, we propose a novel methodology that leverages a causal inference framework to effectively remove gender bias. The proposed method allows us to construct and analyze the complex causal mechanisms facilitating gender information flow while retaining oracle semantic information within word embeddings. Our comprehensive experiments show that the proposed method achieves state-of-the-art results in gender-debiasing tasks. In addition, our methods yield better performance in word similarity evaluation and various extrinsic downstream NLP tasks.

LGOct 17, 2021
Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Ke Sun, Yafei Wang, Yi Liu et al.

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

LGOct 7, 2021
Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Ke Sun, Yingnan Zhao, Enze Shi et al.

The remarkable empirical performance of distributional reinforcement learning (RL) has garnered increasing attention to understanding its theoretical advantages over classical RL. By decomposing the categorical distributional loss commonly employed in distributional RL, we find that the potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization. This less-studied entropy regularization aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the vanilla entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the novel entropy regularization derived from categorical distributional loss implicitly updates policies to align the learned policy with (estimated) environmental uncertainty. Finally, extensive experiments verify the significance of this uncertainty-aware regularization from distributional RL on the empirical benefits over classical RL. Our study offers an innovative exploration perspective to explain the intrinsic benefits of distributional learning in RL.

CVMay 31, 2021
Similarity Embedding Networks for Robust Human Activity Recognition

Chenglin Li, Carrie Lu Tong, Di Niu et al.

Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset.

SPMay 31, 2021
Meta-HAR: Federated Representation Learning for Human Activity Recognition

Chenglin Li, Di Niu, Bei Jiang et al.

Human activity recognition (HAR) based on mobile sensors plays an important role in ubiquitous computing. However, the rise of data regulatory constraints precludes collecting private and labeled signal data from personal devices at scale. Federated learning has emerged as a decentralized alternative solution to model training, which iteratively aggregates locally updated models into a shared global model, therefore being able to leverage decentralized, private data without central collection. However, the effectiveness of federated learning for HAR is affected by the fact that each user has different activity types and even a different signal distribution for the same activity type. Furthermore, it is uncertain if a single global model trained can generalize well to individual users or new users with heterogeneous data. In this paper, we propose Meta-HAR, a federated representation learning framework, in which a signal embedding network is meta-learned in a federated manner, while the learned signal representations are further fed into a personalized classification network at each user for activity prediction. In order to boost the representation ability of the embedding network, we treat the HAR problem at each user as a different task and train the shared embedding network through a Model-Agnostic Meta-learning framework, such that the embedding network can generalize to any individual user. Personalization is further achieved on top of the robustly learned representations in an adaptation procedure. We conducted extensive experiments based on two publicly available HAR datasets as well as a newly created HAR dataset. Results verify that Meta-HAR is effective at maintaining high test accuracies for individual users, including new users, and significantly outperforms several baselines, including Federated Averaging, Reptile and even centralized learning in certain cases.

LGApr 27, 2018
Negative Log Likelihood Ratio Loss for Deep Neural Network Classification

Donglai Zhu, Hengshuai Yao, Bei Jiang et al.

In deep neural network, the cross-entropy loss function is commonly used for classification. Minimizing cross-entropy is equivalent to maximizing likelihood under assumptions of uniform feature and class distributions. It belongs to generative training criteria which does not directly discriminate correct class from competing classes. We propose a discriminative loss function with negative log likelihood ratio between correct and competing classes. It significantly outperforms the cross-entropy loss on the CIFAR-10 image classification task.