Qingcan Wang

5papers

236citations

Novelty44%

AI Score23

Ranked #179,282 of 201,326 authors (top 89%)#39,342 in LG (top 92%)

5 Papers

LGNov 2, 2019

Global Convergence of Gradient Descent for Deep Linear Residual Networks

Lei Wu, Qingcan Wang, Chao Ma

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an $\varepsilon$-optimal point in $O(L^3 \log(1/\varepsilon))$ iterations, which scales polynomially with the network depth $L$. Our result and the $\exp(Ω(L))$ convergence time for the standard initialization (Xavier or near-identity) [Shamir, 2018] together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when $L$ is large.

LGApr 10, 2019

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

Weinan E, Chao Ma, Qingcan Wang et al.

The behavior of the gradient descent (GD) algorithm is analyzed for a deep neural network model with skip-connections. It is proved that in the over-parametrized regime, for a suitable initialization, with high probability GD can find a global minimum exponentially fast. Generalization error estimates along the GD path are also established. As a consequence, it is shown that when the target function is in the reproducing kernel Hilbert space (RKHS) with a kernel defined by the initialization, there exist generalizable early-stopping solutions along the GD path. In addition, it is also shown that the GD path is uniformly close to the functions given by the related random feature model. Consequently, in this "implicit regularization" setting, the deep neural network model deteriorates to a random feature model. Our results hold for neural networks of any width larger than the input dimension.

LGMar 6, 2019

A Priori Estimates of the Population Risk for Residual Networks

Weinan E, Chao Ma, Qingcan Wang

Optimal a priori estimates are derived for the population risk, also known as the generalization error, of a regularized residual network model. An important part of the regularized model is the usage of a new path norm, called the weighted path norm, as the regularization term. The weighted path norm treats the skip connections and the nonlinearities differently so that paths with more nonlinearities are regularized by larger weights. The error estimates are a priori in the sense that the estimates depend only on the target function, not on the parameters obtained in the training process. The estimates are optimal, in a high dimensional setting, in the sense that both the bound for the approximation and estimation errors are comparable to the Monte Carlo error rates. A crucial step in the proof is to establish an optimal bound for the Rademacher complexity of the residual networks. Comparisons are made with existing norm-based generalization error bounds.

LGJul 1, 2018

Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions

Weinan E, Qingcan Wang

We prove that for analytic functions in low dimension, the convergence rate of the deep neural network approximation is exponential.

LGMay 21, 2018

Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference

Ruying Bao, Sihang Liang, Qingcan Wang

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations intentionally added to the original inputs can fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to extract the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the high-dimensional data space and the low-dimensional semantic space; also mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into any pre-trained classifier. We empirically show the quality of reconstruction images and the effectiveness of defense.