Anru Zhang

13papers

1,147citations

Novelty58%

AI Score29

Ranked #151,962 of 205,806 authors (top 74%)#109 in ST (top 56%)

13 Papers

MLMay 3, 2021

Learning Good State and Action Representations via Tensor Decomposition

Chengzhuo Ni, Yaqi Duan, Munther Dahleh et al.

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.

MLNov 9, 2019

ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching

Anru Zhang, Yuetian Luo, Garvesh Raskutti et al.

In this paper, we develop a novel procedure for low-rank tensor regression, namely \emph{\underline{I}mportance \underline{S}ketching \underline{L}ow-rank \underline{E}stimation for \underline{T}ensors} (ISLET). The central idea behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical study that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods while having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods.

MEJun 28, 2019

Learning Markov models via low-rank optimization

Ziwei Zhu, Xudong Li, Mengdi Wang et al.

Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite of having a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum likelihood estimation methods: a convex approach with nuclear-norm regularization and a nonconvex approach with rank constraint. We explicitly derive the statistical rates of both estimators in terms of the Kullback-Leiber divergence and the $\ell_2$ error and also establish a minimax lower bound to assess the tightness of these rates. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.

STSep 6, 2018

Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data

Anru Zhang, Rungang Han

In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named Sparse Tensor Alternating Thresholding for Singular Value Decomposition (STAT-SVD) is proposed. The proposed procedure features a novel double projection \& thresholding scheme, which provides a sharp criterion for thresholding in each iteration. Compared with regular tensor SVD model, STAT-SVD permits more robust estimation under weaker assumptions. Both the upper and lower bounds for estimation accuracy are developed. The proposed procedure is shown to be minimax rate-optimal in a general class of situations. Simulation studies show that STAT-SVD performs well under a variety of configurations. We also illustrate the merits of the proposed procedure on a longitudinal tensor dataset on European country mortality rates.

MLApr 3, 2018

Estimation of Markov Chain via Rank-Constrained Likelihood

Xudong Li, Mengdi Wang, Anru Zhang

This paper studies the estimation of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the $\ell_2$ risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank-constrained non-smooth optimization problem. Convergence results are established. Experiments show that the proposed estimator achieves better empirical performance than other popular approaches.

MLFeb 8, 2018

Spectral State Compression of Markov Processes

Anru Zhang, Mengdi Wang

Model reduction of Markov processes is a basic problem in modeling state-transition systems. Motivated by the state aggregation approach rooted in control theory, we study the statistical state compression of a discrete-state Markov chain from empirical trajectories. Through the lens of spectral decomposition, we study the rank and features of Markov processes, as well as properties like representability, aggregability, and lumpability. We develop spectral methods for estimating the transition matrix of a low-rank Markov model, estimating the leading subspace spanned by Markov features, and recovering latent structures like state aggregation and lumpable partition of the state space. We prove statistical upper bounds for the estimation errors and nearly matching minimax lower bounds. Numerical studies are performed on synthetic data and a dataset of New York City taxi trips.

STJan 29, 2018

Sparse and Low-rank Tensor Estimation via Cubic Sketchings

Botao Hao, Anru Zhang, Guang Cheng

In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings. A two-stage non-convex implementation is developed based on sparse tensor decomposition and thresholded gradient descent, which ensures exact recovery in the noiseless case and stable recovery in the noisy case with high probability. The non-asymptotic analysis sheds light on an interplay between optimization error and statistical error. The proposed procedure is shown to be rate-optimal under certain conditions. As a technical by-product, novel high-order concentration inequalities are derived for studying high-moment sub-Gaussian tensors. An interesting tensor formulation illustrates the potential application to high-order interaction pursuit in high-dimensional linear regression.

STMar 8, 2017

Tensor SVD: Statistical and Computational Limits

Anru Zhang, Dong Xia

In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-to-noise ratio (SNR). In particular, with strong SNR, we show that the classical higher-order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.

MENov 3, 2016

Cross: Efficient Low-rank Tensor Completion

Anru Zhang

The completion of tensors, or high-order arrays, attracts significant attention in recent research. Current literature on tensor completion primarily focuses on recovery from a set of uniformly randomly measured entries, and the required number of measurements to achieve recovery is not guaranteed to be optimal. In addition, the implementation of some previous methods is NP-hard. In this article, we propose a framework for low-rank tensor completion via a novel tensor measurement scheme we name Cross. The proposed procedure is efficient and easy to implement. In particular, we show that a third order tensor of Tucker rank-$(r_1, r_2, r_3)$ in $p_1$-by-$p_2$-by-$p_3$ dimensional space can be recovered from as few as $r_1r_2r_3 + r_1(p_1-r_1) + r_2(p_2-r_2) + r_3(p_3-r_3)$ noiseless measurements, which matches the sample complexity lower-bound. In the case of noisy measurements, we also develop a theoretical upper bound and the matching minimax lower bound for recovery error over certain classes of low-rank tensors for the proposed procedure. The results can be further extended to fourth or higher-order tensors. Simulation studies show that the method performs well under a variety of settings. Finally, the procedure is illustrated through a real dataset in neuroimaging.

MEJun 23, 2016

Semi-supervised Inference: General Theory and Estimation of Means

Anru Zhang, Lawrence D. Brown, T. Tony Cai

We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses ("labels"). Otherwise, the formulation is "assumption-lean" in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $\ell_2$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.

MEApr 8, 2015

Structured Matrix Completion with Applications to Genomic Data Integration

Tianxi Cai, T. Tony Cai, Anru Zhang

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

STOct 22, 2013

ROP: Matrix recovery via rank-one projections

T. Tony Cai, Anru Zhang

Estimation of low-rank matrices is of significant interest in a range of contemporary applications. In this paper, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the paper also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections.

ITJun 5, 2013

Sparse Representation of a Polytope and Recovery of Sparse Signals and Low-rank Matrices

T. Tony Cai, Anru Zhang

This paper considers compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that for any given constant $t\ge {4/3}$, in compressed sensing $δ_{tk}^A < \sqrt{(t-1)/t}$ guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_1$ minimization, and similarly in affine rank minimization $δ_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ ensures the exact reconstruction of all matrices with rank at most $r$ in the noiseless case via the constrained nuclear norm minimization. Moreover, for any $ε>0$, $δ_{tk}^A<\sqrt{\frac{t-1}{t}}+ε$ is not sufficient to guarantee the exact recovery of all $k$-sparse signals for large $k$. Similar result also holds for matrix recovery. In addition, the conditions $δ_{tk}^A < \sqrt{(t-1)/t}$ and $δ_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case.