Jiansheng Yang

h-index15

7papers

169citations

Novelty50%

AI Score41

Ranked #65,390 of 194,257 authors (top 34%)#14,722 in LG (top 37%)

7 Papers

16.0LGOct 30, 2023Code

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

Yifei Wang, Liangchen Li, Jiansheng Yang et al.

Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT.

4.6LGNov 29, 2024

Learning Expressive Random Feature Models via Parametrized Activations

Zailin Ma, Jiansheng Yang, Yaodong Yang

Random feature (RF) method is a powerful kernel approximation technique, but is typically equipped with fixed activation functions, limiting its adaptability across diverse tasks. To overcome this limitation, we introduce the Random Feature Model with Learnable Activation Functions (RFLAF), a novel statistical model that parameterizes activation functions as weighted sums of basis functions within the random feature framework. Examples of basis functions include radial basis functions, spline functions, polynomials, and so forth. For theoretical results, we consider RBFs as representative basis functions. We start with a single RBF as the activation, and then extend the results to multiple RBFs, demonstrating that RF models with learnable activation component largely expand the represented function space. We provide estimates on the required number of samples and random features to achieve low excess risks. For experiments, we test RFLAF with three types of bases: radial basis functions, spline functions and polynomials. Experimental results show that RFLAFs with RBFs and splines consistently outperform other RF models, where RBFs show 3 times faster computational efficiency than splines. We then unfreeze the first-layer parameters and retrain the models, validating the expressivity advantage of learnable activation components on regular two-layer neural networks. Our work provides a deeper understanding of the component of learnable activation functions within modern neural network architectures.

4.1LGOct 17, 2025

On the Generalization Properties of Learning the Random Feature Models with Learnable Activation Functions

Zailin Ma, Jiansheng Yang, Yaodong Yang

This paper studies the generalization properties of a recently proposed kernel method, the Random Feature models with Learnable Activation Functions (RFLAF). By applying a data-dependent sampling scheme for generating features, we provide by far the sharpest bounds on the required number of features for learning RFLAF in both the regression and classification tasks. We provide a unified theorem that describes the complexity of the feature number $s$, and discuss the results for the plain sampling scheme and the data-dependent leverage weighted scheme. Through weighted sampling, the bound on $s$ in the MSE loss case is improved from $Ω(1/ε^2)$ to $\tildeΩ((1/ε)^{1/t})$ in general $(t\geq 1)$, and even to $Ω(1)$ when the Gram matrix has a finite rank. For the Lipschitz loss case, the bound is improved from $Ω(1/ε^2)$ to $\tildeΩ((1/ε^2)^{1/t})$. To learn the weighted RFLAF, we also propose an algorithm to find an approximate kernel and then apply the leverage weighted sampling. Empirical results show that the weighted RFLAF achieves the same performances with a significantly fewer number of features compared to the plainly sampled RFLAF, validating our theories and the effectiveness of this method.

17.2LGOct 28, 2021

Residual Relaxation for Multi-view Representation Learning

Yifei Wang, Zhengyang Geng, Feng Jiang et al.

Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.

6.3MLJul 1, 2021Code

Reparameterized Sampling for Generative Adversarial Networks

Yifei Wang, Yisen Wang, Jiansheng Yang et al.

Recently, sampling methods have been successfully applied to enhance the sample quality of Generative Adversarial Networks (GANs). However, in practice, they typically have poor sample efficiency because of the independent proposal sampling from the generator. In this work, we propose REP-GAN, a novel sampling method that allows general dependent proposals by REParameterizing the Markov chains into the latent space of the generator. Theoretically, we show that our reparameterized proposal admits a closed-form Metropolis-Hastings acceptance ratio. Empirically, extensive experiments on synthetic and real datasets demonstrate that our REP-GAN largely improves the sample efficiency and obtains better sample quality simultaneously.

23.2LGFeb 22, 2021Code

Dissecting the Diffusion Process in Linear Graph Convolutional Networks

Yifei Wang, Yisen Wang, Jiansheng Yang et al.

Graph Convolutional Networks (GCNs) have attracted more and more attentions in recent years. A typical GCN layer consists of a linear feature propagation step and a nonlinear transformation step. Recent works show that a linear GCN can achieve comparable performance to the original non-linear GCN while being much more computationally efficient. In this paper, we dissect the feature propagation steps of linear GCNs from a perspective of continuous graph diffusion, and analyze why linear GCNs fail to benefit from more propagation steps. Following that, we propose Decoupled Graph Convolution (DGC) that decouples the terminal time and the feature propagation steps, making it more flexible and capable of exploiting a very large number of feature propagation steps. Experiments demonstrate that our proposed DGC improves linear GCNs by a large margin and makes them competitive with many modern variants of non-linear GCNs.

1.4MLJul 2, 2020

Decoder-free Robustness Disentanglement without (Additional) Supervision

Yifei Wang, Dan Peng, Furui Liu et al.

Adversarial Training (AT) is proposed to alleviate the adversarial vulnerability of machine learning models by extracting only robust features from the input, which, however, inevitably leads to severe accuracy reduction as it discards the non-robust yet useful features. This motivates us to preserve both robust and non-robust features and separate them with disentangled representation learning. Our proposed Adversarial Asymmetric Training (AAT) algorithm can reliably disentangle robust and non-robust representations without additional supervision on robustness. Empirical results show our method does not only successfully preserve accuracy by combining two representations, but also achieve much better disentanglement than previous work.