Dong Xia

h-index19

20papers

642citations

Novelty57%

AI Score33

Ranked #121,682 of 194,257 authors (top 63%)#86 in ST (top 43%)

20 Papers

8.6MLJun 6, 2023

Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret

Jingyang Li, Jian-Feng Cai, Yang Chen et al.

Large tensor learning algorithms are typically computationally expensive and require storing a vast amount of data. In this paper, we propose a unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions. The algorithm is applicable to both linear and generalized linear models. If the time horizon T is known, oRGrad achieves statistical optimality by choosing an appropriate fixed step size. We find that noisy tensor completion particularly benefits from online algorithms by avoiding the trimming procedure and ensuring sharp entry-wise statistical error, which is often technically challenging for offline methods. The regret of oRGrad is analyzed, revealing a fascinating trilemma concerning the computational convergence rate, statistical error, and regret bound. By selecting an appropriate constant step size, oRGrad achieves an $O(T^{1/2})$ regret. We then introduce the adaptive-oRGrad algorithm, which can achieve the optimal $O(\log T)$ regret by adaptively selecting step sizes, regardless of whether the time horizon is known. The adaptive-oRGrad algorithm can attain a statistically optimal error rate without knowing the horizon. Comprehensive numerical simulations corroborate our theoretical findings. We show that oRGrad significantly outperforms its offline counterpart in predicting the solar F10.7 index with tensor predictors that monitor space weather impacts.

3.3STNov 27, 2023

Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

Zhongyuan Lyu, Ting Li, Dong Xia

In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm.

14.5IVSep 28, 2024

Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion

Jiaxing Xu, Mengcheng Lan, Xia Dong et al.

In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain networks involves using atlases to parcellate the brain into ROIs based on various hypotheses of brain division. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Some recent methods have proposed utilizing multiple atlases, but they neglect consistency across atlases and lack ROI-level information exchange. To tackle these limitations, we propose an Atlas-Integrated Distillation and Fusion network (AIDFusion) to improve brain network classification using fMRI data. AIDFusion addresses the challenge of utilizing multiple atlases by employing a disentangle Transformer to filter out inconsistent atlas-specific information and distill distinguishable connections across atlases. It also incorporates subject- and population-level consistency constraints to enhance cross-atlas consistency. Additionally, AIDFusion employs an inter-atlas message-passing mechanism to fuse complementary information across brain regions. Experimental results on four datasets of different diseases demonstrate the effectiveness and efficiency of AIDFusion compared to state-of-the-art methods. A case study illustrates AIDFusion extract patterns that are both interpretable and consistent with established neuroscience findings.

5.3LGNov 2, 2023

High-dimensional Linear Bandits with Knapsacks

Wanteng Ma, Dong Xia, Jiashuo Jiang

We investigate the contextual bandits with knapsack (CBwK) problem in a high-dimensional linear setting, where the feature dimension can be very large. Our goal is to harness sparsity to obtain sharper regret guarantees. To this end, we first develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner. We then embed this estimator in a primal-dual scheme: every knapsack constraint is paired with a dual variable, which is updated by an online learning rule to keep the cumulative resource consumption within budget. This integrated approach achieves a two-phase sub-linear regret that scales only logarithmically with the feature dimension, improving on the polynomial dependency reported in prior work. Furthermore, we show that either of the following structural assumptions is sufficient for a sharper regret bound of $\tilde{O}(s_{0} \sqrt{T})$: (i) a diverse-covariate condition; and (ii) a margin condition. When both conditions hold simultaneously, we can further control the regret to $O(s_{0}^{2} \log(dT)\log T)$ by a dual resolving scheme. As a by-product, applying our framework to high-dimensional contextual bandits without knapsack constraints recovers the optimal regret rates in both the data-poor and data-rich regimes. Finally, numerical experiments confirm the empirical efficiency of our algorithms in high-dimensional settings.

4.6LGNov 10, 2024

Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates

Congyuan Duan, Wanteng Ma, Jiashuo Jiang et al.

This paper investigates regret minimization, statistical inference, and their interplay in high-dimensional online decision-making based on the sparse linear context bandit model. We integrate the $\varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters and introduce an inference framework based on a debiasing method using inverse propensity weighting. Under a margin condition, our method achieves either $O(T^{1/2})$ regret or classical $O(T^{1/2})$-consistent inference, indicating an unavoidable trade-off between exploration and exploitation. If a diverse covariate condition holds, we demonstrate that a pure-greedy bandit algorithm, i.e., exploration-free, combined with a debiased estimator based on average weighting can simultaneously achieve optimal $O(\log T)$ regret and $O(T^{1/2})$-consistent inference. We also show that a simple sample mean estimator can provide valid inference for the optimal policy's value. Numerical simulations and experiments on Warfarin dosing data validate the effectiveness of our methods.

13.1LGAug 27, 2021

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

Jian-Feng Cai, Jingyang Li, Dong Xia

The tensor train (TT) format enjoys appealing advantages in handling structural high-order tensors. The recent decade has witnessed the wide applications of TT-format tensors from diverse disciplines, among which tensor completion has drawn considerable attention. Numerous fast algorithms, including the Riemannian gradient descent (RGrad), have been proposed for the TT-format tensor completion. However, the theoretical guarantees of these algorithms are largely missing or sub-optimal, partly due to the complicated and recursive algebraic operations in TT-format decomposition. Moreover, existing results established for the tensors of other formats, for example, Tucker and CP, are inapplicable because the algorithms treating TT-format tensors are substantially different and more involved. In this paper, we provide, to our best knowledge, the first theoretical guarantees of the convergence of RGrad algorithm for TT-format tensor completion, under a nearly optimal sample size condition. The RGrad algorithm converges linearly with a constant contraction rate that is free of tensor condition number without the necessity of re-conditioning. We also propose a novel approach, referred to as the sequential second-order moment method, to attain a warm initialization under a similar sample size requirement. As a byproduct, our result even significantly refines the prior investigation of RGrad algorithm for matrix completion. Lastly, statistically (near) optimal rate is derived for RGrad algorithm if the observed entries consist of random sub-Gaussian noise. Numerical experiments confirm our theoretical discovery and showcase the computational speedup gained by the TT-format decomposition.

9.9LGJun 30, 2021

Latent Space Model for Higher-order Networks and Generalized Tensor Decomposition

Zhongyuan Lyu, Dong Xia, Yuan Zhang

We introduce a unified framework, formulated as general latent space models, to study complex higher-order network interactions among multiple entities. Our framework covers several popular models in recent network analysis literature, including mixture multi-layer latent space model and hypergraph latent space model. We formulate the relationship between the latent positions and the observed data via a generalized multilinear kernel as the link function. While our model enjoys decent generality, its maximum likelihood parameter estimation is also convenient via a generalized tensor decomposition procedure.We propose a novel algorithm using projected gradient descent on Grassmannians. We also develop original theoretical guarantees for our algorithm. First, we show its linear convergence under mild conditions. Second, we establish finite-sample statistical error rates of latent position estimation, determined by the signal strength, degrees of freedom and the smoothness of link function, for both general and specific latent space models. We demonstrate the effectiveness of our method on synthetic data. We also showcase the merit of our method on two real-world datasets that are conventionally described by different specific models in producing meaningful and interpretable parameter estimations and accurate link prediction. We demonstrate the effectiveness of our method on synthetic data. We also showcase the merit of our method on two real-world datasets that are conventionally described by different specific models in producing meaningful and interpretable parameter estimations and accurate link prediction.

14.9STAug 31, 2019

Statistical Inferences of Linear Forms for Noisy Matrix Completion

Dong Xia, Ming Yuan

We introduce a flexible framework for making inferences about general linear forms of a large matrix based on noisy observations of a subset of its entries. In particular, under mild regularity conditions, we develop a universal procedure to construct asymptotically normal estimators of its linear forms through double-sample debiasing and low-rank projection whenever an entry-wise consistent estimator of the matrix is available. These estimators allow us to subsequently construct confidence intervals for and test hypotheses about the linear forms. Our proposal was motivated by a careful perturbation analysis of the empirical singular spaces under the noisy matrix completion model which might be of independent interest. The practical merits of our proposed inference procedure are demonstrated on both simulated and real-world data examples.

19.8STJan 2, 2019

Normal Approximation and Confidence Region of Singular Subspaces

Dong Xia

This paper is on the normal approximation of singular subspaces when the noise matrix has i.i.d. entries. Our contributions are three-fold. First, we derive an explicit representation formula of the empirical spectral projectors. The formula is neat and holds for deterministic matrix perturbations. Second, we calculate the expected projection distance between the empirical singular subspaces and true singular subspaces. Our method allows obtaining arbitrary $k$-th order approximation of the expected projection distance. Third, we prove the non-asymptotical normal approximation of the projection distance with different levels of bias corrections. By the $\lceil \log(d_1+d_2)\rceil$-th order bias corrections, the asymptotical normality holds under optimal signal-to-noise ration (SNR) condition where $d_1$ and $d_2$ denote the matrix sizes. In addition, it shows that higher order approximations are unnecessary when $|d_1-d_2|=O((d_1+d_2)^{1/2})$. Finally, we provide comprehensive simulation results to merit our theoretic discoveries. Unlike the existing results, our approach is non-asymptotical and the convergence rates are established. Our method allows the rank $r$ to diverge as fast as $o((d_1+d_2)^{1/3})$. Moreover, our method requires no eigen-gap condition (except the SNR) and no constraints between $d_1$ and $d_2$.

4.3STAug 24, 2018

Non-asymptotic bounds for percentiles of independent non-identical random variables

Dong Xia

This note displays an interesting phenomenon for percentiles of independent but non-identical random variables. Let $X_1,\cdots,X_n$ be independent random variables obeying non-identical continuous distributions and $X^{(1)}\geq \cdots\geq X^{(n)}$ be the corresponding order statistics. For any $p\in(0,1)$, we investigate the $100(1-p)$%-th percentile $X^{(pn)}$ and prove non-asymptotic bounds for $X^{(pn)}$. In particular, for a wide class of distributions, we discover an intriguing connection between their median and the harmonic mean of the associated standard deviations. For example, if $X_k\sim\mathcal{N}(0,σ_k^2)$ for $k=1,\cdots,n$ and $p=\frac{1}{2}$, we show that its median $\big|{\rm Med}\big(X_1,\cdots,X_n\big)\big|= O_P\Big(n^{1/2}\cdot\big(\sum_{k=1}^nσ_k^{-1}\big)^{-1}\Big)$ as long as $\{σ_k\}_{k=1}^n$ satisfy certain mild non-dispersion property.

16.1STMay 24, 2018

Confidence Region of Singular Subspaces for Low-rank Matrix Regression

Dong Xia

Low-rank matrix regression refers to the instances of recovering a low-rank matrix based on specially designed measurements and the corresponding noisy outcomes. In the last decade, numerous statistical methodologies have been developed for efficiently recovering the unknown low-rank matrices. However, in some applications, the unknown singular subspace is scientifically more important than the low-rank matrix itself. In this article, we revisit the low-rank matrix regression model and introduce a two-step procedure to construct confidence regions of the singular subspace. The procedure involves the de-biasing for the typical low-rank estimators after which we calculate the empirical singular vectors. We investigate the distribution of the joint projection distance between the empirical singular subspace and the unknown true singular subspace. We specifically prove the asymptotical normality of the joint projection distance with data-dependent centering and normalization when $r^{3/2}(m_1+m_2)^{3/2}=o(n/\log n)$ where $m_1, m_2$ denote the matrix row and column sizes, $r$ is the rank and $n$ is the number of independent random measurements. Consequently, we propose data-dependent confidence regions of the true singular subspace which attains any pre-determined confidence level asymptotically. In addition, non-asymptotical convergence rates are also established. Numerical results are presented to demonstrate the merits of our methods.

16.0MLNov 14, 2017

Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries

Dong Xia, Ming Yuan, Cun-Hui Zhang

In this article, we develop methods for estimating a low rank tensor from noisy observations on a subset of its entries to achieve both statistical and computational efficiencies. There have been a lot of recent interests in this problem of noisy tensor completion. Much of the attention has been focused on the fundamental computational challenges often associated with problems involving higher order tensors, yet very little is known about their statistical performance. To fill in this void, in this article, we characterize the fundamental statistical limits of noisy tensor completion by establishing minimax optimal rates of convergence for estimating a $k$th order low rank tensor under the general $\ell_p$ ($1\le p\le 2$) norm which suggest significant room for improvement over the existing approaches. Furthermore, we propose a polynomial-time computable estimating procedure based upon power iteration and a second-order spectral initialization that achieves the optimal rates of convergence. Our method is fairly easy to implement and numerical experiments are presented to further demonstrate the practical merits of our estimator.

8.6MEOct 31, 2017

Effective Tensor Sketching via Sparsification

Dong Xia, Ming Yuan

In this paper, we investigate effective sketching schemes via sparsification for high dimensional multilinear arrays or tensors. More specifically, we propose a novel tensor sparsification algorithm that retains a subset of the entries of a tensor in a judicious way, and prove that it can attain a given level of approximation accuracy in terms of tensor spectral norm with a much smaller sample complexity when compared with existing approaches. In particular, we show that for a $k$th order $d\times\cdots\times d$ cubic tensor of {\it stable rank} $r_s$, the sample size requirement for achieving a relative error $\varepsilon$ is, up to a logarithmic factor, of the order $r_s^{1/2} d^{k/2} /\varepsilon$ when $\varepsilon$ is relatively large, and $r_s d /\varepsilon^2$ and essentially optimal when $\varepsilon$ is sufficiently small. It is especially noteworthy that the sample size requirement for achieving a high accuracy is of an order independent of $k$. To further demonstrate the utility of our techniques, we also study how higher order singular value decomposition (HOSVD) of large tensors can be efficiently approximated via sparsification.

10.8STJul 5, 2017

The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising

Dong Xia, Fan Zhou

The higher order singular value decomposition (HOSVD) of tensors is a generalization of matrix SVD. The perturbation analysis of HOSVD under random noise is more delicate than its matrix counterpart. Recently, polynomial time algorithms have been proposed where statistically optimal estimates of the singular subspaces and the low rank tensors are attainable in the Euclidean norm. In this article, we analyze the sup-norm perturbation bounds of HOSVD and introduce estimators of the singular subspaces with sharp deviation bounds in the sup-norm. We also investigate a low rank tensor denoising estimator and demonstrate its fast convergence rate with respect to the entry-wise errors. The sup-norm perturbation bounds reveal unconventional phase transitions for statistical learning applications such as the exact clustering in high dimensional Gaussian mixture model and the exact support recovery in sub-tensor localizations. In addition, the bounds established for HOSVD also elaborate the one-sided sup-norm perturbation bounds for the singular subspaces of unbalanced (or fat) matrices.

22.6STMar 8, 2017

Tensor SVD: Statistical and Computational Limits

Anru Zhang, Dong Xia

In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-to-noise ratio (SNR). In particular, with strong SNR, we show that the classical higher-order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.

13.1MLFeb 22, 2017

On Polynomial Time Methods for Exact Low Rank Tensor Completion

Dong Xia, Ming Yuan

In this paper, we investigate the sample size requirement for exact recovery of a high order tensor of low rank from a subset of its entries. We show that a gradient descent algorithm with initial value obtained from a spectral method can, in particular, reconstruct a ${d\times d\times d}$ tensor of multilinear ranks $(r,r,r)$ with high probability from as few as $O(r^{7/2}d^{3/2}\log^{7/2}d+r^7d\log^6d)$ entries. In the case when the ranks $r=O(1)$, our sample size requirement matches those for nuclear norm minimization (Yuan and Zhang, 2016a), or alternating least squares assuming orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier approaches, however, our method is efficient to compute, easy to implement, and does not impose extra structures on the tensor. Numerical results are presented to further demonstrate the merits of the proposed approach.

3.6MLApr 15, 2016

Estimation of low rank density matrices: bounds in Schatten norms and other distances

Dong Xia, Vladimir Koltchinskii

Let ${\mathcal S}_m$ be the set of all $m\times m$ density matrices (Hermitian positively semi-definite matrices of unit trace). Consider a problem of estimation of an unknown density matrix $ρ\in {\mathcal S}_m$ based on outcomes of $n$ measurements of observables $X_1,\dots, X_n\in {\mathbb H}_m$ (${\mathbb H}_m$ being the space of $m\times m$ Hermitian matrices) for a quantum system identically prepared $n$ times in state $ρ.$ Outcomes $Y_1,\dots, Y_n$ of such measurements could be described by a trace regression model in which ${\mathbb E}_ρ(Y_j|X_j)={\rm tr}(ρX_j), j=1,\dots, n.$ The design variables $X_1,\dots, X_n$ are often sampled at random from the uniform distribution in an orthonormal basis $\{E_1,\dots, E_{m^2}\}$ of ${\mathbb H}_m$ (such as Pauli basis). The goal is to estimate the unknown density matrix $ρ$ based on the data $(X_1,Y_1), \dots, (X_n,Y_n).$ Let $$ \hat Z:=\frac{m^2}{n}\sum_{j=1}^n Y_j X_j $$ and let $\check ρ$ be the projection of $\hat Z$ onto the convex set ${\mathcal S}_m$ of density matrices. It is shown that for estimator $\check ρ$ the minimax lower bounds in classes of low rank density matrices (established earlier) are attained up logarithmic factors for all Schatten $p$-norm distances, $p\in [1,\infty]$ and for Bures version of quantum Hellinger distance. Moreover, for a slightly modified version of estimator $\check ρ$ the same property holds also for quantum relative entropy (Kullback-Leibler) distance between density matrices.

17.4MLJul 17, 2015

Optimal Estimation of Low Rank Density Matrices

Vladimir Koltchinskii, Dong Xia

The density matrices are positively semi-definite Hermitian matrices of unit trace that describe the state of a quantum system. The goal of the paper is to develop minimax lower bounds on error rates of estimation of low rank density matrices in trace regression models used in quantum state tomography (in particular, in the case of Pauli measurements) with explicit dependence of the bounds on the rank and other complexity parameters. Such bounds are established for several statistically relevant distances, including quantum versions of Kullback-Leibler divergence (relative entropy distance) and of Hellinger distance (so called Bures distance), and Schatten $p$-norm distances. Sharp upper bounds and oracle inequalities for least squares estimator with von Neumann entropy penalization are obtained showing that minimax lower bounds are attained (up to logarithmic factors) for these distances.

1.4MLDec 26, 2014

Exploring Sparsity in Multi-class Linear Discriminant Analysis

Dong Xia

Recent studies in the literature have paid much attention to the sparsity in linear classification tasks. One motivation of imposing sparsity assumption on the linear discriminant direction is to rule out the noninformative features, making hardly contribution to the classification problem. Most of those work were focused on the scenarios of binary classification. In the presence of multi-class data, preceding researches recommended individually pairwise sparse linear discriminant analysis(LDA). However, further sparsity should be explored. In this paper, an estimator of grouped LASSO type is proposed to take advantage of sparsity for multi-class data. It enjoys appealing non-asymptotic properties which allows insignificant correlations among features. This estimator exhibits superior capability on both simulated and real data.

2.7MLMar 25, 2014

Optimal Schatten-q and Ky-Fan-k Norm Rate of Low Rank Matrix Estimation

Dong Xia

In this paper, we consider low rank matrix estimation using either matrix-version Dantzig Selector $\hat{A}_λ^d$ or matrix-version LASSO estimator $\hat{A}_λ^L$. We consider sub-Gaussian measurements, $i.e.$, the measurements $X_1,\ldots,X_n\in\mathbb{R}^{m\times m}$ have $i.i.d.$ sub-Gaussian entries. Suppose $\textrm{rank}(A_0)=r$. We proved that, when $n\geq Cm[r^2\vee r\log(m)\log(n)]$ for some $C>0$, both $\hat{A}_λ^d$ and $\hat{A}_λ^L$ can obtain optimal upper bounds(except some logarithmic terms) for estimation accuracy under spectral norm. By applying metric entropy of Grassmann manifolds, we construct (near) matching minimax lower bound for estimation accuracy under spectral norm. We also give upper bounds and matching minimax lower bound(except some logarithmic terms) for estimation accuracy under Schatten-q norm for every $1\leq q\leq\infty$. As a direct corollary, we show both upper bounds and minimax lower bounds of estimation accuracy under Ky-Fan-k norms for every $1\leq k\leq m$.