Fan Zhou

h-index10

3papers

25citations

Novelty53%

AI Score23

Ranked #174,679 of 194,257 authors (top 90%)#37,779 in LG (top 94%)

3 Papers

8.6LGMar 12, 2019

A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction

Fan Zhou, Guojing Cong

Reducing communication in training large-scale machine learning applications on distributed platform is still a big challenge. To address this issue, we propose a distributed hierarchical averaging stochastic gradient descent (Hier-AVG) algorithm with infrequent global reduction by introducing local reduction. As a general type of parallel SGD, Hier-AVG can reproduce several popular synchronous parallel SGD variants by adjusting its parameters. We show that Hier-AVG with infrequent global reduction can still achieve standard convergence rate for non-convex optimization problems. In addition, we show that more frequent local averaging with more participants involved can lead to faster training convergence. By comparing Hier-AVG with another popular distributed training algorithm K-AVG, we show that through deploying local averaging with fewer number of global averaging, Hier-AVG can still achieve comparable training speed while frequently get better test accuracy. This indicates that local averaging can serve as an alternative remedy to effectively reduce communication overhead when the number of learners is large. Experimental results of Hier-AVG with several state-of-the-art deep neural nets on CIFAR-10 and IMAGENET-1K are presented to validate our analysis and show its superiority.

1.0MLFeb 17, 2018

Nonparametric Estimation of Low Rank Matrix Valued Function

Fan Zhou

Let $A:[0,1]\rightarrow\mathbb{H}_m$ (the space of Hermitian matrices) be a matrix valued function which is low rank with entries in Hölder class $Σ(β,L)$. The goal of this paper is to study statistical estimation of $A$ based on the regression model $\mathbb{E}(Y_j|τ_j,X_j) = \langle A(τ_j), X_j \rangle,$ where $τ_j$ are i.i.d. uniformly distributed in $[0,1]$, $X_j$ are i.i.d. matrix completion sampling matrices, $Y_j$ are independent bounded responses. We propose an innovative nuclear norm penalized local polynomial estimator and establish an upper bound on its point-wise risk measured by Frobenius norm. Then we extend this estimator globally and prove an upper bound on its integrated risk measured by $L_2$-norm. We also propose another new estimator based on bias-reducing kernels to study the case when $A$ is not necessarily low rank and establish an upper bound on its risk measured by $L_{\infty}$-norm. We show that the obtained rates are all optimal up to some logarithmic factor in minimax sense. Finally, we propose an adaptive estimation procedure based on Lepskii's method and model selection with data splitting which is computationally efficient and can be easily implemented and parallelized.

10.8STJul 5, 2017

The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising

Dong Xia, Fan Zhou

The higher order singular value decomposition (HOSVD) of tensors is a generalization of matrix SVD. The perturbation analysis of HOSVD under random noise is more delicate than its matrix counterpart. Recently, polynomial time algorithms have been proposed where statistically optimal estimates of the singular subspaces and the low rank tensors are attainable in the Euclidean norm. In this article, we analyze the sup-norm perturbation bounds of HOSVD and introduce estimators of the singular subspaces with sharp deviation bounds in the sup-norm. We also investigate a low rank tensor denoising estimator and demonstrate its fast convergence rate with respect to the entry-wise errors. The sup-norm perturbation bounds reveal unconventional phase transitions for statistical learning applications such as the exact clustering in high dimensional Gaussian mixture model and the exact support recovery in sub-tensor localizations. In addition, the bounds established for HOSVD also elaborate the one-sided sup-norm perturbation bounds for the singular subspaces of unbalanced (or fat) matrices.