CRSep 8, 2022
Majority Vote for Distributed Differentially Private Sign SelectionWeidong Liu, Jiyuan Tu, Xiaojun Mao et al.
Privacy-preserving data analysis has become more prevalent in recent years. In this study, we propose a distributed group differentially private Majority Vote mechanism, for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. For enhanced applicability, we study the private sign selection for mean estimation and linear regression problems, in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified by theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of the proposed method.
MLJun 17, 2023
Distributed Semi-Supervised Sparse Statistical InferenceJiyuan Tu, Weidong Liu, Xiaojun Mao et al.
The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant computational costs. This challenge becomes particularly acute in distributed setups, where traditional methods necessitate computing a debiased estimator on every machine. This becomes unwieldy, especially with a large number of machines. In this paper, we delve into semi-supervised sparse statistical inference in a distributed setup. An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed. We will show that the additional unlabeled data helps to improve the statistical rate of each round of iteration. Our approach offers tailored debiasing methods for $M$-estimation and generalized linear models according to the specific form of the loss function. Our method also applies to a non-smooth loss like absolute deviation loss. Furthermore, our algorithm is computationally efficient since it requires only one estimation of a high-dimensional inverse covariance matrix. We demonstrate the effectiveness of our method by presenting simulation studies and real data applications that highlight the benefits of incorporating unlabeled data.
MLFeb 20, 2023
Transductive Matrix Completion with Calibration for Multi-Task LearningHengfang Wang, Yasi Zhang, Xiaojun Mao et al.
Multi-task learning has attracted much attention due to growing multi-purpose research with multiple related data sources. Moreover, transduction with matrix completion is a useful method in multi-label learning. In this paper, we propose a transductive matrix completion algorithm that incorporates a calibration constraint for the features under the multi-task learning framework. The proposed algorithm recovers the incomplete feature matrix and target matrix simultaneously. Fortunately, the calibration information improves the completion results. In particular, we provide a statistical guarantee for the proposed algorithm, and the theoretical improvement induced by calibration information is also studied. Moreover, the proposed algorithm enjoys a sub-linear convergence rate. Several synthetic data experiments are conducted, which show the proposed algorithm out-performs other existing methods, especially when the target matrix is associated with the feature matrix in a nonlinear way.
MLJan 2, 2024
Efficient Sparse Least Absolute Deviation Regression with Differential PrivacyWeidong Liu, Xiaojun Mao, Xiaofei Zhang et al.
In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss). In this work, we aim to develop a fast privacy-preserving learning solution for a sparse robust regression problem. Our learning loss consists of a robust least absolute loss and an $\ell_1$ sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the $(ε,δ)$-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.
LGJan 31, 2025
A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum AccelerationYuchen Hu, Xi Chen, Weidong Liu et al.
Distributed stochastic optimization algorithms can simultaneously process large-scale datasets, significantly accelerating model training. However, their effectiveness is often hindered by the sparsity of distributed networks and data heterogeneity. In this paper, we propose a momentum-accelerated distributed stochastic gradient algorithm, termed Exact-Diffusion with Momentum (EDM), which mitigates the bias from data heterogeneity and incorporates momentum techniques commonly used in deep learning to enhance convergence rate. Our theoretical analysis demonstrates that the EDM algorithm converges sub-linearly to the neighborhood of the optimal solution, the radius of which is irrespective of data heterogeneity, when applied to non-convex objective functions; under the Polyak-Lojasiewicz condition, which is a weaker assumption than strong convexity, it converges linearly to the target region. Our analysis techniques employed to handle momentum in complex distributed parameter update structures yield a sufficiently tight convergence upper bound, offering a new perspective for the theoretical analysis of other momentum-based distributed algorithms.
MLFeb 11, 2022
Fast and Robust Sparsity Learning over Networks: A Decentralized Surrogate Median Regression ApproachWeidong Liu, Xiaojun Mao, Xin Zhang
Decentralized sparsity learning has attracted a significant amount of attention recently due to its rapidly growing applications. To obtain the robust and sparse estimators, a natural idea is to adopt the non-smooth median loss combined with a $\ell_1$ sparsity regularizer. However, most of the existing methods suffer from slow convergence performance caused by the {\em double} non-smooth objective. To accelerate the computation, in this paper, we proposed a decentralized surrogate median regression (deSMR) method for efficiently solving the decentralized sparsity learning problem. We show that our proposed algorithm enjoys a linear convergence rate with a simple implementation. We also investigate the statistical guarantee, and it shows that our proposed estimator achieves a near-oracle convergence rate without any restriction on the number of network nodes. Moreover, we establish the theoretical results for sparse support recovery. Thorough numerical experiments and real data study are provided to demonstrate the effectiveness of our method.
LGJan 18, 2022
Nonparametric Feature Selection by Random Forests and Deep Neural NetworksXiaojun Mao, Liuhua Peng, Zhonglei Wang
Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature selection algorithm that incorporates random forests and deep neural networks, and its theoretical properties are also investigated under regularity conditions. Using different synthetic models and a real-world example, we demonstrate the advantage of the proposed algorithm over other alternatives in terms of identifying useful features, avoiding useless ones, and the computation efficiency. Although the algorithm is proposed using standard random forests, it can be widely adapted to other machine learning algorithms, as long as features can be sorted accordingly.
CROct 2, 2021
One-Bit Matrix Completion with Differential PrivacyZhengpin Li, Zheng Wei, Zengfeng Huang et al.
As a prevailing collaborative filtering method for recommendation systems, one-bit matrix completion requires data collected by users to provide personalized service. Due to insidious attacks and unexpected inference, the release of users' data often raises serious privacy concerns. To address this issue, differential privacy(DP) has been widely used in standard matrix completion models. To date, however, little has been known about how to apply DP to achieve privacy protection in one-bit matrix completion. In this paper, we propose a unified framework for ensuring a strong privacy guarantee of one-bit matrix completion with DP. In our framework, we develop four different private perturbation mechanisms corresponding to different stages of one-bit matrix completion. For each mechanism, we design a privacy-preserving algorithm and provide a theoretical recovery error bound under the proper conditions. Numerical experiments on synthetic and real-world datasets demonstrate the effectiveness of our proposal. Compared to the one-bit matrix completion without privacy protection, our proposed mechanisms can maintain high-level privacy protection with marginal loss of completion accuracy.
LGOct 1, 2021
Applying Differential Privacy to Tensor CompletionZheng Wei, Zhengpin Li, Xiaojun Mao et al.
Tensor completion aims at filling the missing or unobserved entries based on partially observed tensors. However, utilization of the observed tensors often raises serious privacy concerns in many practical scenarios. To address this issue, we propose a solid and unified framework that contains several approaches for applying differential privacy to the two most widely used tensor decomposition methods: i) CANDECOMP/PARAFAC~(CP) and ii) Tucker decompositions. For each approach, we establish a rigorous privacy guarantee and meanwhile evaluate the privacy-accuracy trade-off. Experiments on synthetic and real-world datasets demonstrate that our proposal achieves high accuracy for tensor completion while ensuring strong privacy protections.
IROct 1, 2021
SAM: A Self-adaptive Attention Module for Context-Aware Recommendation SystemJiabin Liu, Zheng Wei, Zhengpin Li et al.
Recently, textual information has been proved to play a positive role in recommendation systems. However, most of the existing methods only focus on representation learning of textual information in ratings, while potential selection bias induced by the textual information is ignored. In this work, we propose a novel and general self-adaptive module, the Self-adaptive Attention Module (SAM), which adjusts the selection bias by capturing contextual information based on its representation. This module can be embedded into recommendation systems that contain learning components of contextual information. Experimental results on three real-world datasets demonstrate the effectiveness of our proposal, and the state-of-the-art models with SAM significantly outperform the original ones.
MLJun 9, 2021
Matrix Completion with Model-free WeightingJiayi Wang, Raymond K. W. Wong, Xiaojun Mao et al.
In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based on the proposed weighted empirical risk enjoys appealing theoretical guarantees. In particular, the proposed method achieves a stronger guarantee than existing work in terms of the scaling with respect to the observation probabilities, under asymptotically heterogeneous missing settings (where entry-wise observation probabilities can be of different orders). These settings can be regarded as a better theoretical model of missing patterns with highly varying probabilities. We also provide a new minimax lower bound under a class of heterogeneous settings. Numerical experiments are also provided to demonstrate the effectiveness of the proposed method.
MLMar 4, 2021
Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed InferenceJiyuan Tu, Weidong Liu, Xiaojun Mao et al.
This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes, namely arbitrary and possibly adversarial machines in a distributed learning system. In robust statistics, the median-of-means (MOM) has been a popular approach to hedge against Byzantine failures due to its ease of implementation and computational efficiency. However, the MOM estimator has the shortcoming in terms of statistical efficiency. The first main contribution of the paper is to propose a variance reduced median-of-means (VRMOM) estimator, which improves the statistical efficiency over the vanilla MOM estimator and is computationally as efficient as the MOM. Based on the proposed VRMOM estimator, we develop a general distributed inference algorithm that is robust against Byzantine failures. Theoretically, our distributed algorithm achieves a fast convergence rate with only a constant number of rounds of communications. We also provide the asymptotic normality result for the purpose of statistical inference. To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning. The simulation results are also presented to illustrate the effectiveness of our method.
ITDec 3, 2020
Compressive Sensing Approaches for Sparse Distribution Estimation Under Local PrivacyZhongzheng Xiong, Jialin Sun, Xiaojun Mao et al.
Recent years, local differential privacy (LDP) has been adopted by many web service providers like Google \cite{erlingsson2014rappor}, Apple \cite{apple2017privacy} and Microsoft \cite{bolin2017telemetry} to collect and analyse users' data privately. In this paper, we consider the problem of discrete distribution estimation under local differential privacy constraints. Distribution estimation is one of the most fundamental estimation problems, which is widely studied in both non-private and private settings. In the local model, private mechanisms with provably optimal sample complexity are known. However, they are optimal only in the worst-case sense; their sample complexity is proportional to the size of the entire universe, which could be huge in practice. In this paper, we consider sparse or approximately sparse (e.g.\ highly skewed) distribution, and show that the number of samples needed could be significantly reduced. This problem has been studied recently \cite{acharya2021estimating}, but they only consider strict sparse distributions and the high privacy regime. We propose new privatization mechanisms based on compressive sensing. Our methods work for approximately sparse distributions and medium privacy, and have optimal sample and communication complexity.
MLJun 18, 2020
Median Matrix Completion: from Embarrassment to OptimalityWeidong Liu, Xiaojun Mao, Raymond K. W. Wong
In this paper, we consider matrix completion with absolute deviation loss and obtain an estimator of the median matrix. Despite several appealing properties of median, the non-smooth absolute deviation loss leads to computational challenge for large-scale data sets which are increasingly common among matrix completion problems. A simple solution to large-scale problems is parallel computing. However, embarrassingly parallel fashion often leads to inefficient estimators. Based on the idea of pseudo data, we propose a novel refinement step, which turns such inefficient estimators into a rate (near-)optimal matrix completion procedure. The refined estimator is an approximation of a regularized least median estimator, and therefore not an ordinary regularized empirical risk estimator. This leads to a non-standard analysis of asymptotic behaviors. Empirical results are also provided to confirm the effectiveness of the proposed method.
MEJun 13, 2019
Distributed High-dimensional Regression Under a Quantile Loss FunctionXi Chen, Weidong Liu, Xiaojun Mao et al.
This paper studies distributed estimation and support recovery for high-dimensional linear regression model with heavy-tailed noise. To deal with heavy-tailed noise whose variance can be infinite, we adopt the quantile regression loss function instead of the commonly used squared loss. However, the non-smooth quantile loss poses new challenges to high-dimensional distributed estimation in both computation and theoretical development. To address the challenge, we transform the response variable and establish a new connection between quantile regression and ordinary linear regression. Then, we provide a distributed estimator that is both computationally and communicationally efficient, where only the gradient information is communicated at each iteration. Theoretically, we show that, after a constant number of iterations, the proposed estimator achieves a near-oracle convergence rate without any restriction on the number of machines. Moreover, we establish the theoretical guarantee for the support recovery. The simulation analysis is provided to demonstrate the effectiveness of our method.
MLDec 19, 2018
Matrix Completion under Low-Rank Missing MechanismXiaojun Mao, Raymond K. W. Wong, Song Xi Chen
Matrix completion is a modern missing data problem where both the missing structure and the underlying parameter are high dimensional. Although missing structure is a key component to any missing data problems, existing matrix completion methods often assume a simple uniform missing mechanism. In this work, we study matrix completion from corrupted data under a novel low-rank missing mechanism. The probability matrix of observation is estimated via a high dimensional low-rank matrix estimation procedure, and further used to complete the target matrix via inverse probabilities weighting. Due to both high dimensional and extreme (i.e., very small) nature of the true probability matrix, the effect of inverse probability weighting requires careful study. We derive optimal asymptotic convergence rates of the proposed estimators for both the observation probabilities and the target matrix.