Zhisong Pan

14papers

199citations

Novelty53%

AI Score28

Ranked #154,228 of 205,806 authors (top 75%)#33,520 in LG (top 79%)

14 Papers

LGApr 18, 2022

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Xin Liu, Wei Tao, Zhisong Pan

Momentum methods, including heavy-ball~(HB) and Nesterov's accelerated gradient~(NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momentum methods in an over-parameterized regime, where the number of the parameters exceeds that of the training instances. Nonetheless, current results mainly focus on the two-layer neural network, which are far from explaining the remarkable success of the momentum methods in training deep neural networks. Motivated by this, we investigate the convergence of NAG with constant learning rate and momentum parameter in training two architectures of deep linear networks: deep fully-connected linear neural networks and deep linear ResNets. Based on the over-parameterization regime, we first analyze the residual dynamics induced by the training trajectory of NAG for a deep fully-connected linear neural network under the random Gaussian initialization. Our results show that NAG can converge to the global minimum at a $(1 - \mathcal{O}(1/\sqrtκ))^t$ rate, where $t$ is the iteration number and $κ> 1$ is a constant depending on the condition number of the feature matrix. Compared to the $(1 - \mathcal{O}(1/κ))^t$ rate of GD, NAG achieves an acceleration over GD. To the best of our knowledge, this is the first theoretical guarantee for the convergence of NAG to the global minimum in training deep neural networks. Furthermore, we extend our analysis to deep linear ResNets and derive a similar convergence result.

LGAug 8, 2022

Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Xin Liu, Wei Tao, Wei Li et al.

Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including the heavy ball (HB) method and Nesterov's accelerated gradient (NAG) method, are the workhorse of first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits superior performance than HB. However, current theoretical works fail to distinguish their convergence difference in training neural networks. To fill this gap, we consider the training problem of the two-layer ReLU neural network under over-parameterization and random initialization. Leveraging high-resolution dynamical systems and neural tangent kernel (NTK) theory, our result not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also provides the first theoretical guarantee for the acceleration of NAG over HB in training neural networks. Finally, we validate our theoretical results on three benchmark datasets.

LGJun 20, 2022

FedSSO: A Federated Server-Side Second-Order Optimization Algorithm

Xin Ma, Renyi Bao, Jinpeng Jiang et al.

In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communication for second-order updates between clients and server entirely. We provide theoretical guarantee for convergence of our novel method, and empirically demonstrate our fast convergence and communication savings in both convex and non-convex settings.

AIMay 23, 2022

Multiple Domain Cyberspace Attack and Defense Game Based on Reward Randomization Reinforcement Learning

Lei Zhang, Yu Pan, Yi Liu et al.

The existing network attack and defense method can be regarded as game, but most of the game only involves network domain, not multiple domain cyberspace. To address this challenge, this paper proposed a multiple domain cyberspace attack and defense game model based on reinforcement learning. We define the multiple domain cyberspace include physical domain, network domain and digital domain. By establishing two agents, representing the attacker and the defender respectively, defender will select the multiple domain actions in the multiple domain cyberspace to obtain defender's optimal reward by reinforcement learning. In order to improve the defense ability of defender, a game model based on reward randomization reinforcement learning is proposed. When the defender takes the multiple domain defense action, the reward is randomly given and subject to linear distribution, so as to find the better defense policy and improve defense success rate. The experimental results show that the game model can effectively simulate the attack and defense state of multiple domain cyberspace, and the proposed method has a higher defense success rate than DDPG and DQN.

AIMay 16, 2022

KGRGRL: A User's Permission Reasoning Method Based on Knowledge Graph Reward Guidance Reinforcement Learning

Lei Zhang, Yu Pan, Yi Liu et al.

In general, multiple domain cyberspace security assessments can be implemented by reasoning user's permissions. However, while existing methods include some information from the physical and social domains, they do not provide a comprehensive representation of cyberspace. Existing reasoning methods are also based on expert-given rules, resulting in inefficiency and a low degree of intelligence. To address this challenge, we create a Knowledge Graph (KG) of multiple domain cyberspace in order to provide a standard semantic description of the multiple domain cyberspace. Following that, we proposed a user's permissions reasoning method based on reinforcement learning. All permissions in cyberspace are represented as nodes, and an agent is trained to find all permissions that user can have according to user's initial permissions and cyberspace KG. We set 10 reward setting rules based on the features of cyberspace KG in the reinforcement learning of reward information setting, so that the agent can better locate user's all permissions and avoid blindly finding user's permissions. The results of the experiments showed that the proposed method can successfully reason about user's permissions and increase the intelligence level of the user's permissions reasoning method. At the same time, the F1 value of the proposed method is 6% greater than that of the Translating Embedding (TransE) method.

LGOct 9, 2023

On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks

Xin Liu, Wei li, Dazhi Zhan et al.

Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data. In practice, FL encounters challenges in dealing with partial client participation due to the limited bandwidth, intermittent connection and strict synchronized delay. Simultaneously, there exist few theoretical convergence guarantees in this practical setting, especially when associated with the non-convex optimization of neural networks. To bridge this gap, we focus on the training problem of federated averaging (FedAvg) method for two canonical models: a deep linear network and a two-layer ReLU network. Under the over-parameterized assumption, we provably show that FedAvg converges to a global minimum at a linear rate $\mathcal{O}\left((1-\frac{min_{i \in [t]}|S_i|}{N^2})^t\right)$ after $t$ iterations, where $N$ is the number of clients and $|S_i|$ is the number of the participated clients in the $i$-th iteration. Experimental evaluations confirm our theoretical results.

CVDec 11, 2021Code

Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Junhua Zou, Zhisong Pan, Junyang Qiu et al.

We introduce a three stage pipeline: resized-diverse-inputs (RDIM), diversity-ensemble (DEM) and region fitting, that work together to generate transferable adversarial examples. We first explore the internal relationship between existing attacks, and propose RDIM that is capable of exploiting this relationship. Then we propose DEM, the multi-scale version of RDIM, to generate multi-scale gradients. After the first two steps we transform value fitting into region fitting across iterations. RDIM and region fitting do not require extra running time and these three steps can be well integrated into other attacks. Our best attack fools six black-box defenses with a 93% success rate on average, which is higher than the state-of-the-art gradient-based attacks. Besides, we rethink existing attacks rather than simply stacking new methods on the old ones to get better performance. It is expected that our findings will serve as the beginning of exploring the internal relationship between attack methods. Codes are available at https://github.com/278287847/DEM.

CVJan 1, 2022

Adversarial Attack via Dual-Stage Network Erosion

Yexin Duan, Junhua Zou, Xingyu Zhou et al.

Deep neural networks are vulnerable to adversarial examples, which can fool deep models by adding subtle perturbations. Although existing attacks have achieved promising results, it still leaves a long way to go for generating transferable adversarial examples under the black-box setting. To this end, this paper proposes to improve the transferability of adversarial examples, and applies dual-stage feature-level perturbations to an existing model to implicitly create a set of diverse models. Then these models are fused by the longitudinal ensemble during the iterations. The proposed method is termed Dual-Stage Network Erosion (DSNE). We conduct comprehensive experiments both on non-residual and residual networks, and obtain more transferable adversarial examples with the computational cost similar to the state-of-the-art method. In particular, for the residual networks, the transferability of the adversarial examples can be significantly improved by biasing the residual block information to the skip connections. Our work provides new insights into the architectural vulnerability of neural networks and presents new challenges to the robustness of neural networks.

CVSep 1, 2021

Learning Coated Adversarial Camouflages for Object Detectors

Yexin Duan, Jialin Chen, Xingyu Zhou et al.

An adversary can fool deep neural network object detectors by generating adversarial noises. Most of the existing works focus on learning local visible noises in an adversarial "patch" fashion. However, the 2D patch attached to a 3D object tends to suffer from an inevitable reduction in attack performance as the viewpoint changes. To remedy this issue, this work proposes the Coated Adversarial Camouflage (CAC) to attack the detectors in arbitrary viewpoints. Unlike the patch trained in the 2D space, our camouflage generated by a conceptually different training framework consists of 3D rendering and dense proposals attack. Specifically, we make the camouflage perform 3D spatial transformations according to the pose changes of the object. Based on the multi-view rendering results, the top-n proposals of the region proposal network are fixed, and all the classifications in the fixed dense proposals are attacked simultaneously to output errors. In addition, we build a virtual 3D scene to fairly and reproducibly evaluate different attacks. Extensive experiments demonstrate the superiority of CAC over the existing attacks, and it shows impressive performance both in the virtual scene and the real world. This poses a potential threat to the security-critical computer vision systems.

LGJul 5, 2021

Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks

Xin Liu, Zhisong Pan, Wei Tao

Momentum methods, such as heavy ball method~(HB) and Nesterov's accelerated gradient method~(NAG), have been widely used in training neural networks by incorporating the history of gradients into the current updating process. In practice, they often provide improved performance over (stochastic) gradient descent~(GD) with faster convergence. Despite these empirical successes, theoretical understandings of their accelerated convergence rates are still lacking. Recently, some attempts have been made by analyzing the trajectories of gradient-based methods in an over-parameterized regime, where the number of the parameters is significantly larger than the number of the training instances. However, the majority of existing theoretical work is mainly concerned with GD and the established convergence result of NAG is inferior to HB and GD, which fails to explain the practical success of NAG. In this paper, we take a step towards closing this gap by analyzing NAG in training a randomly initialized over-parameterized two-layer fully connected neural network with ReLU activation. Despite the fact that the objective function is non-convex and non-smooth, we show that NAG converges to a global minimum at a non-asymptotic linear rate $(1-Θ(1/\sqrtκ))^t$, where $κ> 1$ is the condition number of a gram matrix and $t$ is the number of the iterations. Compared to the convergence rate $(1-Θ(1/κ))^t$ of GD, our result provides theoretical guarantees for the acceleration of NAG in neural network training. Furthermore, our findings suggest that NAG and HB have similar convergence rate. Finally, we conduct extensive experiments on six benchmark datasets to validate the correctness of our theoretical results.

LGDec 29, 2020

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Wei Tao, Wei Li, Zhisong Pan et al.

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

AIJul 9, 2020

Weakness Analysis of Cyberspace Configuration Based on Reinforcement Learning

Lei Zhang, Wei Bai, Shize Guo et al.

In this work, we present a learning-based approach to analysis cyberspace configuration. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of agents as attackers, our method becomes better at rapidly finding attack paths for previously hidden paths, especially in multiple domain cyberspace. To achieve these results, we pose finding attack paths as a Reinforcement Learning (RL) problem and train an agent to find multiple domain attack paths. To enable our RL policy to find more hidden attack paths, we ground representation introduction an multiple domain action select module in RL. By designing a simulated cyberspace experimental environment to verify our method. Our objective is to find more hidden attack paths, to analysis the weakness of cyberspace configuration. The experimental results show that our method can find more hidden multiple domain attack paths than existing baselines methods.

CVJul 8, 2020

Making Adversarial Examples More Transferable and Indistinguishable

Junhua Zou, Yexin Duan, Boyu Li et al.

Fast gradient sign attack series are popular methods that are used to generate adversarial examples. However, most of the approaches based on fast gradient sign attack series cannot balance the indistinguishability and transferability due to the limitations of the basic sign structure. To address this problem, we propose a method, called Adam Iterative Fast Gradient Tanh Method (AI-FGTM), to generate indistinguishable adversarial examples with high transferability. Besides, smaller kernels and dynamic step size are also applied to generate adversarial examples for further increasing the attack success rates. Extensive experiments on an ImageNet-compatible dataset show that our method generates more indistinguishable adversarial examples and achieves higher attack success rates without extra running time and resource. Our best transfer-based attack NI-TI-DI-AITM can fool six classic defense models with an average success rate of 89.3% and three advanced defense models with an average success rate of 82.7%, which are higher than the state-of-the-art gradient-based attacks. Additionally, our method can also reduce nearly 20% mean perturbation. We expect that our method will serve as a new baseline for generating adversarial examples with better transferability and indistinguishability.

MLMar 17, 2014

Multi-task Feature Selection based Anomaly Detection

Longqi Yang, Yibing Wang, Zhisong Pan et al.

Network anomaly detection is still a vibrant research area. As the fast growth of network bandwidth and the tremendous traffic on the network, there arises an extremely challengeable question: How to efficiently and accurately detect the anomaly on multiple traffic? In multi-task learning, the traffic consisting of flows at different time periods is considered as a task. Multiple tasks at different time periods performed simultaneously to detect anomalies. In this paper, we apply the multi-task feature selection in network anomaly detection area which provides a powerful method to gather information from multiple traffic and detect anomalies on it simultaneously. In particular, the multi-task feature selection includes the well-known l1-norm based feature selection as a special case given only one task. Moreover, we show that the multi-task feature selection is more accurate by utilizing more information simultaneously than the l1-norm based method. At the evaluation stage, we preprocess the raw data trace from trans-Pacific backbone link between Japan and the United States, label with anomaly communities, and generate a 248-feature dataset. We show empirically that the multi-task feature selection outperforms independent l1-norm based feature selection on real traffic dataset.