Dong Xia

ML
h-index5
32papers
793citations
Novelty54%
AI Score48

32 Papers

LGSep 1, 2022
Optimal Regularized Online Allocation by Adaptive Re-Solving

Wanteng Ma, Ying Cao, Danny H. K. Tsang et al.

This paper introduces a dual-based algorithm framework for solving the regularized online resource allocation problems, which have potentially non-concave cumulative rewards, hard resource constraints, and a non-separable regularizer. Under a strategy of adaptively updating the resource constraints, the proposed framework only requests approximate solutions to the empirical dual problems up to a certain accuracy and yet delivers an optimal logarithmic regret under a locally second-order growth condition. Surprisingly, a delicate analysis of the dual objective function enables us to eliminate the notorious log-log factor in regret bound. The flexible framework renders renowned and computationally fast algorithms immediately applicable, e.g., dual stochastic gradient descent. Additionally, an infrequent re-solving scheme is proposed, which significantly reduces computational demands without compromising the optimal regret performance. A worst-case square-root regret lower bound is established if the resource constraints are not adaptively updated during dual optimization, which underscores the critical role of adaptive dual variable update. Comprehensive numerical experiments demonstrate the merits of the proposed algorithm framework.

MLFeb 9, 2023Code
rMultiNet: An R Package For Multilayer Networks Analysis

Ting Li, Zhongyuan Lyu, Chenyu Ren et al.

This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods, such as clustering. Three real data examples are processed in the package. The source code of rMultiNet is available at https://github.com/ChenyuzZZ73/rMultiNet.

MLJun 6, 2023
Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret

Jingyang Li, Jian-Feng Cai, Yang Chen et al.

Large tensor learning algorithms are typically computationally expensive and require storing a vast amount of data. In this paper, we propose a unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions. The algorithm is applicable to both linear and generalized linear models. If the time horizon T is known, oRGrad achieves statistical optimality by choosing an appropriate fixed step size. We find that noisy tensor completion particularly benefits from online algorithms by avoiding the trimming procedure and ensuring sharp entry-wise statistical error, which is often technically challenging for offline methods. The regret of oRGrad is analyzed, revealing a fascinating trilemma concerning the computational convergence rate, statistical error, and regret bound. By selecting an appropriate constant step size, oRGrad achieves an $O(T^{1/2})$ regret. We then introduce the adaptive-oRGrad algorithm, which can achieve the optimal $O(\log T)$ regret by adaptively selecting step sizes, regardless of whether the time horizon is known. The adaptive-oRGrad algorithm can attain a statistically optimal error rate without knowing the horizon. Comprehensive numerical simulations corroborate our theoretical findings. We show that oRGrad significantly outperforms its offline counterpart in predicting the solar F10.7 index with tensor predictors that monitor space weather impacts.

STJul 11, 2022
Optimal Clustering by Lloyd Algorithm for Low-Rank Mixture Model

Zhongyuan Lyu, Dong Xia

This paper investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM) to treat matrix-valued observations, which assumes low-rankness for population center matrices. A computationally efficient clustering method is designed by integrating Lloyd's algorithm and low-rank approximation. Once well-initialized, the algorithm converges fast and achieves an exponential-type clustering error rate that is minimax optimal. Meanwhile, we show that a tensor-based spectral method delivers a good initial clustering. Comparable to GMM, the minimax optimal clustering error rate is decided by the separation strength, i.e., the minimal distance between population center matrices. By exploiting low-rankness, the proposed algorithm is blessed with a weaker requirement on the separation strength. Unlike GMM, however, the computational difficulty of LrMM is characterized by the signal strength, i.e., the smallest non-zero singular values of population center matrices. Evidence is provided showing that no polynomial-time algorithm is consistent if the signal strength is not strong enough, even though the separation strength is strong. Intriguing differences between estimation and clustering under LrMM are discussed. The merits of low-rank Lloyd's algorithm are confirmed by comprehensive simulation experiments. Finally, our method outperforms others in the literature on real-world datasets.

MLSep 28, 2022
Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model

Tianxi Cai, Dong Xia, Luwan Zhang et al.

Network analysis has been a powerful tool to unveil relationships and interactions among a large number of objects. Yet its effectiveness in accurately identifying important node-node interactions is challenged by the rapidly growing network size, with data being collected at an unprecedented granularity and scale. Common wisdom to overcome such high dimensionality is collapsing nodes into smaller groups and conducting connectivity analysis on the group level. Dividing efforts into two phases inevitably opens a gap in consistency and drives down efficiency. Consensus learning emerges as a new normal for common knowledge discovery with multiple data sources available. In this paper, we propose a unified multi-view sparse low-rank block model (msLBM) framework, which enables simultaneous grouping and connectivity analysis by combining multiple data sources. The msLBM framework efficiently represents overlapping information across large scale concepts and accommodates different types of heterogeneity across sources. Both features are desirable when analyzing high dimensional electronic health record (EHR) datasets from multiple health systems. An estimating procedure based on the alternating minimization algorithm is proposed. Our theoretical results demonstrate that a consensus knowledge graph can be more accurately learned by leveraging multi-source datasets, and statistically optimal rates can be achieved under mild conditions. Applications to the real world EHR data suggest that our proposed msLBM algorithm can more reliably reveal network structure among clinical concepts by effectively combining summary level EHR data from multiple health systems.

CVJan 27Code
DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation

Renrong Shao, Dongyang Li, Dong Xia et al.

Vision Mamba models have been extensively researched in various fields, which address the limitations of previous models by effectively managing long-range dependencies with a linear-time overhead. Several prospective studies have further designed Vision Mamba based on UNet(VM-UNet) for medical image segmentation. These approaches primarily focus on optimizing architectural designs by creating more complex structures to enhance the model's ability to perceive semantic features. In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs. To achieve this goal, we develop double self-distillation methods to align the features at both the global and local levels. Extensive experiments conducted on the ISIC2017, ISIC2018, and Synapse benchmarks demonstrate that our approach achieves state-of-the-art performance while maintaining computational efficiency. Code is available at https://github.com/RoryShao/DSVM-UNet.git.

STNov 27, 2023
Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

Zhongyuan Lyu, Ting Li, Dong Xia

In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm.

MLSep 26, 2024
Local Prediction-Powered Inference

Yanwu Gu, Dong Xia

To infer a function value on a specific point $x$, it is essential to assign higher weights to the points closer to $x$, which is called local polynomial / multivariable regression. In many practical cases, a limited sample size may ruin this method, but such conditions can be improved by the Prediction-Powered Inference (PPI) technique. This paper introduced a specific algorithm for local multivariable regression using PPI, which can significantly reduce the variance of estimations without enlarge the error. The confidence intervals, bias correction, and coverage probabilities are analyzed and proved the correctness and superiority of our algorithm. Numerical simulation and real-data experiments are applied and show these conclusions. Another contribution compared to PPI is the theoretical computation efficiency and explainability by taking into account the dependency of the dependent variable.

LGNov 2, 2023
High-dimensional Linear Bandits with Knapsacks

Wanteng Ma, Dong Xia, Jiashuo Jiang

We investigate the contextual bandits with knapsack (CBwK) problem in a high-dimensional linear setting, where the feature dimension can be very large. Our goal is to harness sparsity to obtain sharper regret guarantees. To this end, we first develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner. We then embed this estimator in a primal-dual scheme: every knapsack constraint is paired with a dual variable, which is updated by an online learning rule to keep the cumulative resource consumption within budget. This integrated approach achieves a two-phase sub-linear regret that scales only logarithmically with the feature dimension, improving on the polynomial dependency reported in prior work. Furthermore, we show that either of the following structural assumptions is sufficient for a sharper regret bound of $\tilde{O}(s_{0} \sqrt{T})$: (i) a diverse-covariate condition; and (ii) a margin condition. When both conditions hold simultaneously, we can further control the regret to $O(s_{0}^{2} \log(dT)\log T)$ by a dual resolving scheme. As a by-product, applying our framework to high-dimensional contextual bandits without knapsack constraints recovers the optimal regret rates in both the data-poor and data-rich regimes. Finally, numerical experiments confirm the empirical efficiency of our algorithms in high-dimensional settings.

79.3MLApr 30
Prediction-powered Inference by Mixture of Experts

Yanwu Gu, Linglong Kong, Dong Xia

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.

MLApr 26, 2024
Online Policy Learning and Inference by Matrix Completion

Congyuan Duan, Jingyang Li, Dong Xia

Is it possible to make online decisions when personalized covariates are unavailable? We take a collaborative-filtering approach for decision-making based on collective preferences. By assuming low-dimensional latent features, we formulate the covariate-free decision-making problem as a matrix completion bandit. We propose a policy learning procedure that combines an $\varepsilon$-greedy policy for decision-making with an online gradient descent algorithm for bandit parameter estimation. Our novel two-phase design balances policy learning accuracy and regret performance. For policy inference, we develop an online debiasing method based on inverse propensity weighting and establish its asymptotic normality. Our methods are applied to data from the San Francisco parking pricing project, revealing intriguing discoveries and outperforming the benchmark policy.

LGNov 10, 2024
Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates

Congyuan Duan, Wanteng Ma, Jiashuo Jiang et al.

This paper investigates regret minimization, statistical inference, and their interplay in high-dimensional online decision-making based on the sparse linear context bandit model. We integrate the $\varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters and introduce an inference framework based on a debiasing method using inverse propensity weighting. Under a margin condition, our method achieves either $O(T^{1/2})$ regret or classical $O(T^{1/2})$-consistent inference, indicating an unavoidable trade-off between exploration and exploitation. If a diverse covariate condition holds, we demonstrate that a pure-greedy bandit algorithm, i.e., exploration-free, combined with a debiased estimator based on average weighting can simultaneously achieve optimal $O(\log T)$ regret and $O(T^{1/2})$-consistent inference. We also show that a simple sample mean estimator can provide valid inference for the optimal policy's value. Numerical simulations and experiments on Warfarin dosing data validate the effectiveness of our methods.

STJan 22, 2022
Optimal Estimation and Computational Limit of Low-rank Gaussian Mixtures

Zhongyuan Lyu, Dong Xia

Structural matrix-variate observations routinely arise in diverse fields such as multi-layer network analysis and brain image clustering. While data of this type have been extensively investigated with fruitful outcomes being delivered, the fundamental questions like its statistical optimality and computational limit are largely under-explored. In this paper, we propose a low-rank Gaussian mixture model (LrMM) assuming each matrix-valued observation has a planted low-rank structure. Minimax lower bounds for estimating the underlying low-rank matrix are established allowing a whole range of sample sizes and signal strength. Under a minimal condition on signal strength, referred to as the information-theoretical limit or statistical limit, we prove the minimax optimality of a maximum likelihood estimator which, in general, is computationally infeasible. If the signal is stronger than a certain threshold, called the computational limit, we design a computationally fast estimator based on spectral aggregation and demonstrate its minimax optimality. Moreover, when the signal strength is smaller than the computational limit, we provide evidences based on the low-degree likelihood ratio framework to claim that no polynomial-time algorithm can consistently recover the underlying low-rank matrix. Our results reveal multiple phase transitions in the minimax error rates and the statistical-to-computational gap. Numerical experiments confirm our theoretical findings. We further showcase the merit of our spectral aggregation method on the worldwide food trading dataset.

LGAug 27, 2021
Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

Jian-Feng Cai, Jingyang Li, Dong Xia

The tensor train (TT) format enjoys appealing advantages in handling structural high-order tensors. The recent decade has witnessed the wide applications of TT-format tensors from diverse disciplines, among which tensor completion has drawn considerable attention. Numerous fast algorithms, including the Riemannian gradient descent (RGrad), have been proposed for the TT-format tensor completion. However, the theoretical guarantees of these algorithms are largely missing or sub-optimal, partly due to the complicated and recursive algebraic operations in TT-format decomposition. Moreover, existing results established for the tensors of other formats, for example, Tucker and CP, are inapplicable because the algorithms treating TT-format tensors are substantially different and more involved. In this paper, we provide, to our best knowledge, the first theoretical guarantees of the convergence of RGrad algorithm for TT-format tensor completion, under a nearly optimal sample size condition. The RGrad algorithm converges linearly with a constant contraction rate that is free of tensor condition number without the necessity of re-conditioning. We also propose a novel approach, referred to as the sequential second-order moment method, to attain a warm initialization under a similar sample size requirement. As a byproduct, our result even significantly refines the prior investigation of RGrad algorithm for matrix completion. Lastly, statistically (near) optimal rate is derived for RGrad algorithm if the observed entries consist of random sub-Gaussian noise. Numerical experiments confirm our theoretical discovery and showcase the computational speedup gained by the TT-format decomposition.

LGJun 30, 2021
Latent Space Model for Higher-order Networks and Generalized Tensor Decomposition

Zhongyuan Lyu, Dong Xia, Yuan Zhang

We introduce a unified framework, formulated as general latent space models, to study complex higher-order network interactions among multiple entities. Our framework covers several popular models in recent network analysis literature, including mixture multi-layer latent space model and hypergraph latent space model. We formulate the relationship between the latent positions and the observed data via a generalized multilinear kernel as the link function. While our model enjoys decent generality, its maximum likelihood parameter estimation is also convenient via a generalized tensor decomposition procedure.We propose a novel algorithm using projected gradient descent on Grassmannians. We also develop original theoretical guarantees for our algorithm. First, we show its linear convergence under mild conditions. Second, we establish finite-sample statistical error rates of latent position estimation, determined by the signal strength, degrees of freedom and the smoothness of link function, for both general and specific latent space models. We demonstrate the effectiveness of our method on synthetic data. We also showcase the merit of our method on two real-world datasets that are conventionally described by different specific models in producing meaningful and interpretable parameter estimations and accurate link prediction. We demonstrate the effectiveness of our method on synthetic data. We also showcase the merit of our method on two real-world datasets that are conventionally described by different specific models in producing meaningful and interpretable parameter estimations and accurate link prediction.

STDec 29, 2020
Inference for Low-rank Tensors -- No Need to Debias

Dong Xia, Anru R. Zhang, Yuchen Zhou

In this paper, we consider the statistical inference for several low-rank tensor models. Specifically, in the Tucker low-rank tensor PCA or regression model, provided with any estimates achieving some attainable error rate, we develop the data-driven confidence regions for the singular subspace of the parameter tensor based on the asymptotic distribution of an updated estimate by two-iteration alternating minimization. The asymptotic distributions are established under some essential conditions on the signal-to-noise ratio (in PCA model) or sample size (in regression model). If the parameter tensor is further orthogonally decomposable, we develop the methods and non-asymptotic theory for inference on each individual singular vector. For the rank-one tensor PCA model, we establish the asymptotic distribution for general linear forms of principal components and confidence interval for each entry of the parameter tensor. Finally, numerical simulations are presented to corroborate our theoretical discoveries. In all these models, we observe that different from many matrix/vector settings in existing work, debiasing is not required to establish the asymptotic distribution of estimates or to make statistical inference on low-rank tensors. In fact, due to the widely observed statistical-computational-gap for low-rank tensor estimation, one usually requires stronger conditions than the statistical (or information-theoretic) limit to ensure the computationally feasible estimation is achievable. Surprisingly, such conditions ``incidentally" render a feasible low-rank tensor inference without debiasing.

STApr 14, 2020
Edgeworth expansions for network moments

Yuan Zhang, Dong Xia

Network method of moments arXiv:1202.5101 is an important tool for nonparametric network inference. However, there has been little investigation on accurate descriptions of the sampling distributions of network moment statistics. In this paper, we present the first higher-order accurate approximation to the sampling CDF of a studentized network moment by Edgeworth expansion. In sharp contrast to classical literature on noiseless U-statistics, we show that the Edgeworth expansion of a network moment statistic as a noisy U-statistic can achieve higher-order accuracy without non-lattice or smoothness assumptions but just requiring weak regularity conditions. Behind this result is our surprising discovery that the two typically-hated factors in network analysis, namely, sparsity and edge-wise observational errors, jointly play a blessing role, contributing a crucial self-smoothing effect in the network moment statistic and making it analytically tractable. Our assumptions match the minimum requirements in related literature. For sparse networks, our theory shows a simple normal approximation achieves a gradually depreciating Berry-Esseen bound as the network becomes sparser. This result also refines the best previous theoretical result. For practitioners, our empirical Edgeworth expansion is highly accurate, fast and easy to implement. We demonstrate the clear advantage of our method by comprehensive simulation studies. We showcase three applications of our results in network inference. We prove, to our knowledge, the first theoretical guarantee of higher-order accuracy for some network bootstrap schemes, and moreover, the first theoretical guidance for selecting the sub-sample size for network sub-sampling. We also derive one-sample test and Cornish-Fisher confidence interval for a given moment with higher-order accurate controls of confidence level and type I error, respectively.

SIFeb 10, 2020
Community Detection on Mixture Multi-layer Networks via Regularized Tensor Decomposition

Bing-Yi Jing, Ting Li, Zhongyuan Lyu et al.

We study the problem of community detection in multi-layer networks, where pairs of nodes can be related in multiple modalities. We introduce a general framework, i.e., mixture multi-layer stochastic block model (MMSBM), which includes many earlier models as special cases. We propose a tensor-based algorithm (TWIST) to reveal both global/local memberships of nodes, and memberships of layers. We show that the TWIST procedure can accurately detect the communities with small misclassification error as the number of nodes and/or the number of layers increases. Numerical studies confirm our theoretical findings. To our best knowledge, this is the first systematic study on the mixture multi-layer networks using tensor decomposition. The method is applied to two real datasets: worldwide trading networks and malaria parasite genes networks, yielding new and interesting findings.

STAug 31, 2019
Statistical Inferences of Linear Forms for Noisy Matrix Completion

Dong Xia, Ming Yuan

We introduce a flexible framework for making inferences about general linear forms of a large matrix based on noisy observations of a subset of its entries. In particular, under mild regularity conditions, we develop a universal procedure to construct asymptotically normal estimators of its linear forms through double-sample debiasing and low-rank projection whenever an entry-wise consistent estimator of the matrix is available. These estimators allow us to subsequently construct confidence intervals for and test hypotheses about the linear forms. Our proposal was motivated by a careful perturbation analysis of the empirical singular spaces under the noisy matrix completion model which might be of independent interest. The practical merits of our proposed inference procedure are demonstrated on both simulated and real-world data examples.

STJan 2, 2019
Normal Approximation and Confidence Region of Singular Subspaces

Dong Xia

This paper is on the normal approximation of singular subspaces when the noise matrix has i.i.d. entries. Our contributions are three-fold. First, we derive an explicit representation formula of the empirical spectral projectors. The formula is neat and holds for deterministic matrix perturbations. Second, we calculate the expected projection distance between the empirical singular subspaces and true singular subspaces. Our method allows obtaining arbitrary $k$-th order approximation of the expected projection distance. Third, we prove the non-asymptotical normal approximation of the projection distance with different levels of bias corrections. By the $\lceil \log(d_1+d_2)\rceil$-th order bias corrections, the asymptotical normality holds under optimal signal-to-noise ration (SNR) condition where $d_1$ and $d_2$ denote the matrix sizes. In addition, it shows that higher order approximations are unnecessary when $|d_1-d_2|=O((d_1+d_2)^{1/2})$. Finally, we provide comprehensive simulation results to merit our theoretic discoveries. Unlike the existing results, our approach is non-asymptotical and the convergence rates are established. Our method allows the rank $r$ to diverge as fast as $o((d_1+d_2)^{1/3})$. Moreover, our method requires no eigen-gap condition (except the SNR) and no constraints between $d_1$ and $d_2$.

STAug 24, 2018
Non-asymptotic bounds for percentiles of independent non-identical random variables

Dong Xia

This note displays an interesting phenomenon for percentiles of independent but non-identical random variables. Let $X_1,\cdots,X_n$ be independent random variables obeying non-identical continuous distributions and $X^{(1)}\geq \cdots\geq X^{(n)}$ be the corresponding order statistics. For any $p\in(0,1)$, we investigate the $100(1-p)$%-th percentile $X^{(pn)}$ and prove non-asymptotic bounds for $X^{(pn)}$. In particular, for a wide class of distributions, we discover an intriguing connection between their median and the harmonic mean of the associated standard deviations. For example, if $X_k\sim\mathcal{N}(0,σ_k^2)$ for $k=1,\cdots,n$ and $p=\frac{1}{2}$, we show that its median $\big|{\rm Med}\big(X_1,\cdots,X_n\big)\big|= O_P\Big(n^{1/2}\cdot\big(\sum_{k=1}^nσ_k^{-1}\big)^{-1}\Big)$ as long as $\{σ_k\}_{k=1}^n$ satisfy certain mild non-dispersion property.

STMay 24, 2018
Confidence Region of Singular Subspaces for Low-rank Matrix Regression

Dong Xia

Low-rank matrix regression refers to the instances of recovering a low-rank matrix based on specially designed measurements and the corresponding noisy outcomes. In the last decade, numerous statistical methodologies have been developed for efficiently recovering the unknown low-rank matrices. However, in some applications, the unknown singular subspace is scientifically more important than the low-rank matrix itself. In this article, we revisit the low-rank matrix regression model and introduce a two-step procedure to construct confidence regions of the singular subspace. The procedure involves the de-biasing for the typical low-rank estimators after which we calculate the empirical singular vectors. We investigate the distribution of the joint projection distance between the empirical singular subspace and the unknown true singular subspace. We specifically prove the asymptotical normality of the joint projection distance with data-dependent centering and normalization when $r^{3/2}(m_1+m_2)^{3/2}=o(n/\log n)$ where $m_1, m_2$ denote the matrix row and column sizes, $r$ is the rank and $n$ is the number of independent random measurements. Consequently, we propose data-dependent confidence regions of the true singular subspace which attains any pre-determined confidence level asymptotically. In addition, non-asymptotical convergence rates are also established. Numerical results are presented to demonstrate the merits of our methods.

MLNov 14, 2017
Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries

Dong Xia, Ming Yuan, Cun-Hui Zhang

In this article, we develop methods for estimating a low rank tensor from noisy observations on a subset of its entries to achieve both statistical and computational efficiencies. There have been a lot of recent interests in this problem of noisy tensor completion. Much of the attention has been focused on the fundamental computational challenges often associated with problems involving higher order tensors, yet very little is known about their statistical performance. To fill in this void, in this article, we characterize the fundamental statistical limits of noisy tensor completion by establishing minimax optimal rates of convergence for estimating a $k$th order low rank tensor under the general $\ell_p$ ($1\le p\le 2$) norm which suggest significant room for improvement over the existing approaches. Furthermore, we propose a polynomial-time computable estimating procedure based upon power iteration and a second-order spectral initialization that achieves the optimal rates of convergence. Our method is fairly easy to implement and numerical experiments are presented to further demonstrate the practical merits of our estimator.

MEOct 31, 2017
Effective Tensor Sketching via Sparsification

Dong Xia, Ming Yuan

In this paper, we investigate effective sketching schemes via sparsification for high dimensional multilinear arrays or tensors. More specifically, we propose a novel tensor sparsification algorithm that retains a subset of the entries of a tensor in a judicious way, and prove that it can attain a given level of approximation accuracy in terms of tensor spectral norm with a much smaller sample complexity when compared with existing approaches. In particular, we show that for a $k$th order $d\times\cdots\times d$ cubic tensor of {\it stable rank} $r_s$, the sample size requirement for achieving a relative error $\varepsilon$ is, up to a logarithmic factor, of the order $r_s^{1/2} d^{k/2} /\varepsilon$ when $\varepsilon$ is relatively large, and $r_s d /\varepsilon^2$ and essentially optimal when $\varepsilon$ is sufficiently small. It is especially noteworthy that the sample size requirement for achieving a high accuracy is of an order independent of $k$. To further demonstrate the utility of our techniques, we also study how higher order singular value decomposition (HOSVD) of large tensors can be efficiently approximated via sparsification.

STJul 5, 2017
The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising

Dong Xia, Fan Zhou

The higher order singular value decomposition (HOSVD) of tensors is a generalization of matrix SVD. The perturbation analysis of HOSVD under random noise is more delicate than its matrix counterpart. Recently, polynomial time algorithms have been proposed where statistically optimal estimates of the singular subspaces and the low rank tensors are attainable in the Euclidean norm. In this article, we analyze the sup-norm perturbation bounds of HOSVD and introduce estimators of the singular subspaces with sharp deviation bounds in the sup-norm. We also investigate a low rank tensor denoising estimator and demonstrate its fast convergence rate with respect to the entry-wise errors. The sup-norm perturbation bounds reveal unconventional phase transitions for statistical learning applications such as the exact clustering in high dimensional Gaussian mixture model and the exact support recovery in sub-tensor localizations. In addition, the bounds established for HOSVD also elaborate the one-sided sup-norm perturbation bounds for the singular subspaces of unbalanced (or fat) matrices.

STMar 8, 2017
Tensor SVD: Statistical and Computational Limits

Anru Zhang, Dong Xia

In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-to-noise ratio (SNR). In particular, with strong SNR, we show that the classical higher-order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.

MLFeb 22, 2017
On Polynomial Time Methods for Exact Low Rank Tensor Completion

Dong Xia, Ming Yuan

In this paper, we investigate the sample size requirement for exact recovery of a high order tensor of low rank from a subset of its entries. We show that a gradient descent algorithm with initial value obtained from a spectral method can, in particular, reconstruct a ${d\times d\times d}$ tensor of multilinear ranks $(r,r,r)$ with high probability from as few as $O(r^{7/2}d^{3/2}\log^{7/2}d+r^7d\log^6d)$ entries. In the case when the ranks $r=O(1)$, our sample size requirement matches those for nuclear norm minimization (Yuan and Zhang, 2016a), or alternating least squares assuming orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier approaches, however, our method is efficient to compute, easy to implement, and does not impose extra structures on the tensor. Numerical results are presented to further demonstrate the merits of the proposed approach.

MLOct 16, 2016
Estimation of low rank density matrices by Pauli measurements

Dong Xia

Density matrices are positively semi-definite Hermitian matrices with unit trace that describe the states of quantum systems. Many quantum systems of physical interest can be represented as high-dimensional low rank density matrices. A popular problem in {\it quantum state tomography} (QST) is to estimate the unknown low rank density matrix of a quantum system by conducting Pauli measurements. Our main contribution is twofold. First, we establish the minimax lower bounds in Schatten $p$-norms with $1\leq p\leq +\infty$ for low rank density matrices estimation by Pauli measurements. In our previous paper, these minimax lower bounds are proved under the trace regression model with Gaussian noise and the noise is assumed to have common variance. In this paper, we prove these bounds under the Binomial observation model which meets the actual model in QST. Second, we study the Dantzig estimator (DE) for estimating the unknown low rank density matrix under the Binomial observation model by using Pauli measurements. In our previous papers, we studied the least squares estimator and the projection estimator, where we proved the optimal convergence rates for the least squares estimator in Schatten $p$-norms with $1\leq p\leq 2$ and, under a stronger condition, the optimal convergence rates for the projection estimator in Schatten $p$-norms with $1\leq p\leq +\infty$. In this paper, we show that the results of these two distinct estimators can be simultaneously obtained by the Dantzig estimator. Moreover, better convergence rates in Schatten norm distances can be proved for Dantzig estimator under conditions weaker than those needed in previous papers. When the objective function of DE is replaced by the negative von Neumann entropy, we obtain sharp convergence rate in Kullback-Leibler divergence.

MLApr 15, 2016
Estimation of low rank density matrices: bounds in Schatten norms and other distances

Dong Xia, Vladimir Koltchinskii

Let ${\mathcal S}_m$ be the set of all $m\times m$ density matrices (Hermitian positively semi-definite matrices of unit trace). Consider a problem of estimation of an unknown density matrix $ρ\in {\mathcal S}_m$ based on outcomes of $n$ measurements of observables $X_1,\dots, X_n\in {\mathbb H}_m$ (${\mathbb H}_m$ being the space of $m\times m$ Hermitian matrices) for a quantum system identically prepared $n$ times in state $ρ.$ Outcomes $Y_1,\dots, Y_n$ of such measurements could be described by a trace regression model in which ${\mathbb E}_ρ(Y_j|X_j)={\rm tr}(ρX_j), j=1,\dots, n.$ The design variables $X_1,\dots, X_n$ are often sampled at random from the uniform distribution in an orthonormal basis $\{E_1,\dots, E_{m^2}\}$ of ${\mathbb H}_m$ (such as Pauli basis). The goal is to estimate the unknown density matrix $ρ$ based on the data $(X_1,Y_1), \dots, (X_n,Y_n).$ Let $$ \hat Z:=\frac{m^2}{n}\sum_{j=1}^n Y_j X_j $$ and let $\check ρ$ be the projection of $\hat Z$ onto the convex set ${\mathcal S}_m$ of density matrices. It is shown that for estimator $\check ρ$ the minimax lower bounds in classes of low rank density matrices (established earlier) are attained up logarithmic factors for all Schatten $p$-norm distances, $p\in [1,\infty]$ and for Bures version of quantum Hellinger distance. Moreover, for a slightly modified version of estimator $\check ρ$ the same property holds also for quantum relative entropy (Kullback-Leibler) distance between density matrices.

MLJul 17, 2015
Optimal Estimation of Low Rank Density Matrices

Vladimir Koltchinskii, Dong Xia

The density matrices are positively semi-definite Hermitian matrices of unit trace that describe the state of a quantum system. The goal of the paper is to develop minimax lower bounds on error rates of estimation of low rank density matrices in trace regression models used in quantum state tomography (in particular, in the case of Pauli measurements) with explicit dependence of the bounds on the rank and other complexity parameters. Such bounds are established for several statistically relevant distances, including quantum versions of Kullback-Leibler divergence (relative entropy distance) and of Hellinger distance (so called Bures distance), and Schatten $p$-norm distances. Sharp upper bounds and oracle inequalities for least squares estimator with von Neumann entropy penalization are obtained showing that minimax lower bounds are attained (up to logarithmic factors) for these distances.

MLDec 26, 2014
Exploring Sparsity in Multi-class Linear Discriminant Analysis

Dong Xia

Recent studies in the literature have paid much attention to the sparsity in linear classification tasks. One motivation of imposing sparsity assumption on the linear discriminant direction is to rule out the noninformative features, making hardly contribution to the classification problem. Most of those work were focused on the scenarios of binary classification. In the presence of multi-class data, preceding researches recommended individually pairwise sparse linear discriminant analysis(LDA). However, further sparsity should be explored. In this paper, an estimator of grouped LASSO type is proposed to take advantage of sparsity for multi-class data. It enjoys appealing non-asymptotic properties which allows insignificant correlations among features. This estimator exhibits superior capability on both simulated and real data.

MLMar 25, 2014
Optimal Schatten-q and Ky-Fan-k Norm Rate of Low Rank Matrix Estimation

Dong Xia

In this paper, we consider low rank matrix estimation using either matrix-version Dantzig Selector $\hat{A}_λ^d$ or matrix-version LASSO estimator $\hat{A}_λ^L$. We consider sub-Gaussian measurements, $i.e.$, the measurements $X_1,\ldots,X_n\in\mathbb{R}^{m\times m}$ have $i.i.d.$ sub-Gaussian entries. Suppose $\textrm{rank}(A_0)=r$. We proved that, when $n\geq Cm[r^2\vee r\log(m)\log(n)]$ for some $C>0$, both $\hat{A}_λ^d$ and $\hat{A}_λ^L$ can obtain optimal upper bounds(except some logarithmic terms) for estimation accuracy under spectral norm. By applying metric entropy of Grassmann manifolds, we construct (near) matching minimax lower bound for estimation accuracy under spectral norm. We also give upper bounds and matching minimax lower bound(except some logarithmic terms) for estimation accuracy under Schatten-q norm for every $1\leq q\leq\infty$. As a direct corollary, we show both upper bounds and minimax lower bounds of estimation accuracy under Ky-Fan-k norms for every $1\leq k\leq m$.