LGMay 27, 2022
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive PowerBinghui Li, Jikai Jin, Han Zhong et al. · stanford
It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. This result holds even if the data is linear separable (which means achieving standard generalization is easy), and more generally for any parameterized function classes as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
AINov 20, 2022
On the Complexity of Bayesian GeneralizationYu-Zhe Shi, Manjie Xu, John E. Hopcroft et al.
We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the $complexity$ of concepts becomes diverse. Specifically, at the $representational \ level$, we seek to answer how the complexity varies when a visual concept is mapped to the representation space. Prior psychology literature has shown that two types of complexities (i.e., subjective complexity and visual complexity) (Griffiths and Tenenbaum, 2003) build an inverted-U relation (Donderi, 2006; Sun and Firestone, 2021). Leveraging Representativeness of Attribute (RoA), we computationally confirm the following observation: Models use attributes with high RoA to describe visual concepts, and the description length falls in an inverted-U relation with the increment in visual complexity. At the $computational \ level$, we aim to answer how the complexity of representation affects the shift between the rule- and similarity-based generalization. We hypothesize that category-conditioned visual modeling estimates the co-occurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity outperform those with relatively low subjective complexity in the rule-based generalization, while the trend is the opposite in the similarity-based generalization.
CVNov 2, 2025
Parameter Interpolation Adversarial Training for Robust Image ClassificationXin Liu, Yichen Yang, Kun He et al.
Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks. However, existing adversarial training methods show that the model robustness has apparent oscillations and overfitting issues in the training process, degrading the defense efficacy. To address these issues, we propose a novel framework called Parameter Interpolation Adversarial Training (PIAT). PIAT tunes the model parameters between each epoch by interpolating the parameters of the previous and current epochs. It makes the decision boundary of model change more moderate and alleviates the overfitting issue, helping the model converge better and achieving higher model robustness. In addition, we suggest using the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the relative magnitude of logits between clean and adversarial examples rather than the absolute magnitude. Extensive experiments conducted on several benchmark datasets demonstrate that our framework could prominently improve the robustness of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).
LGOct 17, 2023
SignGT: Signed Attention-based Graph Transformer for Graph Representation LearningJinsong Chen, Gaichao Li, John E. Hopcroft et al.
The emerging graph Transformers have achieved impressive performance for graph representation learning over graph neural networks (GNNs). In this work, we regard the self-attention mechanism, the core module of graph Transformers, as a two-step aggregation operation on a fully connected graph. Due to the property of generating positive attention values, the self-attention mechanism is equal to conducting a smooth operation on all nodes, preserving the low-frequency information. However, only capturing the low-frequency information is inefficient in learning complex relations of nodes on diverse graphs, such as heterophily graphs where the high-frequency information is crucial. To this end, we propose a Signed Attention-based Graph Transformer (SignGT) to adaptively capture various frequency information from the graphs. Specifically, SignGT develops a new signed self-attention mechanism (SignSA) that produces signed attention values according to the semantic relevance of node pairs. Hence, the diverse frequency information between different node pairs could be carefully preserved. Besides, SignGT proposes a structure-aware feed-forward network (SFFN) that introduces the neighborhood bias to preserve the local topology information. In this way, SignGT could learn informative node representations from both long-range dependencies and local topology information. Extensive empirical results on both node-level and graph-level tasks indicate the superiority of SignGT against state-of-the-art graph Transformers as well as advanced GNNs.
CVNov 27, 2022
Class-aware Information for Logit-based Knowledge DistillationShuoxi Zhang, Hanpeng Liu, John E. Hopcroft et al.
Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-based distillation. In this work, we revisit the logit-based knowledge distillation, and observe that the existing logit-based distillation methods treat the prediction logits only in the instance level, while many other useful semantic information is overlooked. To address this issue, we propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance. We further introduce a novel loss called Class Correlation Loss to force the student learn the inherent class-level correlation of the teacher. Empirical comparisons demonstrate the superiority of the proposed method over several prevailing logit-based methods and feature-based methods, in which CLKD achieves compelling results on various visual classification tasks and outperforms the state-of-the-art baselines.
CVNov 15, 2022
Local Magnification for Data and Feature AugmentationKun He, Chang Liu, Stephen Lin et al.
In recent years, many data augmentation techniques have been proposed to increase the diversity of input data and reduce the risk of overfitting on deep neural networks. In this work, we propose an easy-to-implement and model-free data augmentation method called Local Magnification (LOMA). Different from other geometric data augmentation methods that perform global transformations on images, LOMA generates additional training data by randomly magnifying a local area of the image. This local magnification results in geometric changes that significantly broaden the range of augmentations while maintaining the recognizability of objects. Moreover, we extend the idea of LOMA and random cropping to the feature space to augment the feature map, which further boosts the classification accuracy considerably. Experiments show that our proposed LOMA, though straightforward, can be combined with standard data augmentation to significantly improve the performance on image classification and object detection. And further combination with our feature augmentation techniques, termed LOMA_IF&FO, can continue to strengthen the model and outperform advanced intensity transformation methods for data augmentation.
AIOct 31, 2023
Diversified Node Sampling based Hierarchical Transformer Pooling for Graph Representation LearningGaichao Li, Jinsong Chen, John E. Hopcroft et al.
Graph pooling methods have been widely used on downsampling graphs, achieving impressive results on multiple graph-level tasks like graph classification and graph generation. An important line called node dropping pooling aims at exploiting learnable scoring functions to drop nodes with comparatively lower significance scores. However, existing node dropping methods suffer from two limitations: (1) for each pooled node, these models struggle to capture long-range dependencies since they mainly take GNNs as the backbones; (2) pooling only the highest-scoring nodes tends to preserve similar nodes, thus discarding the affluent information of low-scoring nodes. To address these issues, we propose a Graph Transformer Pooling method termed GTPool, which introduces Transformer to node dropping pooling to efficiently capture long-range pairwise interactions and meanwhile sample nodes diversely. Specifically, we design a scoring module based on the self-attention mechanism that takes both global context and local context into consideration, measuring the importance of nodes more comprehensively. GTPool further utilizes a diversified sampling method named Roulette Wheel Sampling (RWS) that is able to flexibly preserve nodes across different scoring intervals instead of only higher scoring nodes. In this way, GTPool could effectively obtain long-range information and select more representative nodes. Extensive experiments on 11 benchmark datasets demonstrate the superiority of GTPool over existing popular graph pooling methods.
CVMar 24, 2023
PIAT: Parameter Interpolation based Adversarial Training for Image ClassificationKun He, Xin Liu, Yichen Yang et al.
Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Specifically, at the end of each epoch, PIAT tunes the model parameters as the interpolation of the parameters of the previous and current epochs. Besides, we suggest to use the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the clean and adversarial examples. Compared with other regularization methods, NMSE focuses more on the relative magnitude of the logits rather than the absolute magnitude. Extensive experiments on several benchmark datasets and various networks show that our method could prominently improve the model robustness and reduce the generalization error. Moreover, our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
LGNov 21, 2021Code
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial TransferabilityYifeng Xiong, Jiadong Lin, Min Zhang et al.
The black-box adversarial attack has attracted impressive attention for its practical use in the field of deep learning security. Meanwhile, it is very challenging as there is no access to the network architecture or internal weights of the target model. Based on the hypothesis that if an example remains adversarial for multiple models, then it is more likely to transfer the attack capability to other models, the ensemble-based adversarial attack methods are efficient and widely used for black-box attacks. However, ways of ensemble attack are rather less investigated, and existing ensemble attacks simply fuse the outputs of all the models evenly. In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima. To this end, we propose a novel attack method called the stochastic variance reduced ensemble (SVRE) attack, which could reduce the gradient variance of the ensemble models and take full advantage of the ensemble attack. Empirical results on the standard ImageNet dataset demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly. Code is available at https://github.com/JHL-HUST/SVRE.
LGDec 2, 2024
Tokenizing 3D Molecule Structure with Quantized Spherical CoordinatesKaiyuan Gao, Yusong Wang, Haoxiang Guan et al.
The application of language models (LMs) to molecular structure generation using line notations such as SMILES and SELFIES has been well-established in the field of cheminformatics. However, extending these models to generate 3D molecular structures presents significant challenges. Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic coordinates, and (2) the non-trivial task of tokenizing continuous coordinates for use in LMs, which inherently require discrete inputs. To address these challenges, we propose Mol-StrucTok, a novel method for tokenizing 3D molecular structures. Our approach comprises two key innovations: (1) We design a line notation for 3D molecules by extracting local atomic coordinates in a spherical coordinate system. This notation builds upon existing 2D line notations and remains agnostic to their specific forms, ensuring compatibility with various molecular representation schemes. (2) We employ a Vector Quantized Variational Autoencoder (VQ-VAE) to tokenize these coordinates, treating them as generation descriptors. To further enhance the representation, we incorporate neighborhood bond lengths and bond angles as understanding descriptors. Leveraging this tokenization framework, we train a GPT-2 style model for 3D molecular generation tasks. Results demonstrate strong performance with significantly faster generation speeds and competitive chemical stability compared to previous methods. Further, by integrating our learned discrete representations into Graphormer model for property prediction on QM9 dataset, Mol-StrucTok reveals consistent improvements across various molecular properties, underscoring the versatility and robustness of our approach.
LGFeb 12, 2025
Rethinking Tokenized Graph Transformers for Node ClassificationJinsong Chen, Chenyang Li, GaiChao Li et al.
Node tokenized graph Transformers (GTs) have shown promising performance in node classification. The generation of token sequences is the key module in existing tokenized GTs which transforms the input graph into token sequences, facilitating the node representation learning via Transformer. In this paper, we observe that the generations of token sequences in existing GTs only focus on the first-order neighbors on the constructed similarity graphs, which leads to the limited usage of nodes to generate diverse token sequences, further restricting the potential of tokenized GTs for node classification. To this end, we propose a new method termed SwapGT. SwapGT first introduces a novel token swapping operation based on the characteristics of token sequences that fully leverages the semantic relevance of nodes to generate more informative token sequences. Then, SwapGT leverages a Transformer-based backbone to learn node representations from the generated token sequences. Moreover, SwapGT develops a center alignment loss to constrain the representation learning from multiple token sequences, further enhancing the model performance. Extensive empirical results on various datasets showcase the superiority of SwapGT for node classification.
CLOct 8, 2025
Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream TasksQinhao Zhou, Xiang Xiang, Kun He et al.
In recent years, the growing interest in Large Language Models (LLMs) has significantly advanced prompt engineering, transitioning from manual design to model-based optimization. Prompts for LLMs generally comprise two components: the \textit{instruction}, which defines the task or objective, and the \textit{input}, which is tailored to the instruction type. In natural language generation (NLG) tasks such as machine translation, the \textit{input} component is particularly critical, while the \textit{instruction} component tends to be concise. Existing prompt engineering methods primarily focus on optimizing the \textit{instruction} component for general tasks, often requiring large-parameter LLMs as auxiliary tools. However, these approaches exhibit limited applicability for tasks like machine translation, where the \textit{input} component plays a more pivotal role. To address this limitation, this paper introduces a novel prompt optimization method specifically designed for machine translation tasks. The proposed approach employs a small-parameter model trained using a back-translation-based strategy, significantly reducing training overhead for single-task optimization while delivering highly effective performance. With certain adaptations, this method can also be extended to other downstream tasks.
LGMay 23, 2025
DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node ClassificationChenyang Li, Jinsong Chen, John E. Hopcroft et al.
Neighborhood-aware tokenized graph Transformers have recently shown great potential for node classification tasks. Despite their effectiveness, our in-depth analysis of neighborhood tokens reveals two critical limitations in the existing paradigm. First, current neighborhood token generation methods fail to adequately capture attribute correlations within a neighborhood. Second, the conventional self-attention mechanism suffers from attention diversion when processing neighborhood tokens, where high-hop neighborhoods receive disproportionate focus, severely disrupting information interactions between the target node and its neighborhood tokens. To address these challenges, we propose DAM-GT, Dual positional encoding-based Attention Masking graph Transformer. DAM-GT introduces a novel dual positional encoding scheme that incorporates attribute-aware encoding via an attribute clustering strategy, effectively preserving node correlations in both topological and attribute spaces. In addition, DAM-GT formulates a new attention mechanism with a simple yet effective masking strategy to guide interactions between target nodes and their neighborhood tokens, overcoming the issue of attention diversion. Extensive experiments on various graphs with different homophily levels as well as different scales demonstrate that DAM-GT consistently outperforms state-of-the-art methods in node classification tasks.
LGApr 25, 2025
Enhancing Pre-Trained Model-Based Class-Incremental Learning through Neural CollapseKun He, Zijian Song, Shuoxi Zhang et al.
Class-Incremental Learning (CIL) is a critical capability for real-world applications, enabling learning systems to adapt to new tasks while retaining knowledge from previous ones. Recent advancements in pre-trained models (PTMs) have significantly advanced the field of CIL, demonstrating superior performance over traditional methods. However, understanding how features evolve and are distributed across incremental tasks remains an open challenge. In this paper, we propose a novel approach to modeling feature evolution in PTM-based CIL through the lens of neural collapse (NC), a striking phenomenon observed in the final phase of training, which leads to a well-separated, equiangular feature space. We explore the connection between NC and CIL effectiveness, showing that aligning feature distributions with the NC geometry enhances the ability to capture the dynamic behavior of continual learning. Based on this insight, we introduce Neural Collapse-inspired Pre-Trained Model-based CIL (NCPTM-CIL), a method that dynamically adjusts the feature space to conform to the elegant NC structure, thereby enhancing the continual learning process. Extensive experiments demonstrate that NCPTM-CIL outperforms state-of-the-art methods across four benchmark datasets. Notably, when initialized with ViT-B/16-IN1K, NCPTM-CIL surpasses the runner-up method by 6.73% on VTAB, 1.25% on CIFAR-100, and 2.5% on OmniBenchmark.
LGJun 27, 2024
Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph TransformersJinsong Chen, Hanpeng Liu, John E. Hopcroft et al.
While tokenized graph Transformers have demonstrated strong performance in node classification tasks, their reliance on a limited subset of nodes with high similarity scores for constructing token sequences overlooks valuable information from other nodes, hindering their ability to fully harness graph information for learning optimal node representations. To address this limitation, we propose a novel graph Transformer called GCFormer. Unlike previous approaches, GCFormer develops a hybrid token generator to create two types of token sequences, positive and negative, to capture diverse graph information. And a tailored Transformer-based backbone is adopted to learn meaningful node representations from these generated token sequences. Additionally, GCFormer introduces contrastive learning to extract valuable information from both positive and negative token sequences, enhancing the quality of learned node representations. Extensive experimental results across various datasets, including homophily and heterophily graphs, demonstrate the superiority of GCFormer in node classification, when compared to representative graph neural networks (GNNs) and graph Transformers.
SIDec 8, 2021
Uncovering the Local Hidden Community Structure in Social NetworksMeng Wang, Boyu Li, Kun He et al.
Hidden community is a useful concept proposed recently for social network analysis. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on our modified local spectral method and detect an initial dominant local community. Then we temporarily remove the members of this community as well as their connections to other nodes, and detect all the neighborhood communities in the remaining subgraph, including some "broken communities" that only contain a fraction of members in the original network. The local community and neighborhood communities form a dominant layer, and by reducing the edge weights inside these communities, we weaken this layer's structure to reveal the hidden layers. Eventually, we repeat the whole process and all communities containing the seed node can be detected and boosted iteratively. We theoretically show that our method can avoid some situations that a broken community and the local community are regarded as one community in the subgraph, leading to the inaccuracy on detection which can be caused by global hidden community detection methods. Extensive experiments show that our method could significantly outperform the state-of-the-art baselines designed for either global hidden community detection or multiple local community detection.
SIJul 31, 2021
Structure Amplification on Multi-layer Stochastic Block ModelsXiaodong Xin, Kun He, Jialu Bao et al.
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our previous work proposes a general structure amplification technique called HICODE that uncovers many layers of functional hidden structure in complex networks. HICODE incrementally weakens dominant structure through randomization allowing the hidden functionality to emerge, and uncovers these hidden structure in real-world networks that previous methods rarely uncover. In this work, we conduct a comprehensive and systematic theoretical analysis on the hidden community structure. In what follows, we define multi-layer stochastic block model, and provide theoretical support using the model on why the existence of hidden structure will make the detection of dominant structure harder compared with equivalent random noise. We then provide theoretical proofs that the iterative reducing methods could help promote the uncovering of hidden structure as well as boosting the detection quality of dominant structure.
CVJul 6, 2021
Integrating Large Circular Kernels into CNNs through Neural Architecture SearchKun He, Chao Li, Yixiao Yang et al.
The square kernel is a standard unit for contemporary CNNs, as it fits well on the tensor computation for convolution operation. However, the retinal ganglion cells in the biological visual system have approximately concentric receptive fields. Motivated by this observation, we propose to use circular kernel with a concentric and isotropic receptive field as an option for the convolution operation. We first propose a simple yet efficient implementation of the convolution using circular kernels, and empirically show the significant advantages of large circular kernels over the counterpart square kernels. We then expand the operation space of several typical Neural Architecture Search (NAS) methods with the convolutions of large circular kernels. The searched new neural architectures do contain large circular kernels and outperform the original searched models considerably. Our additional analysis also reveals that large circular kernels could help the model to be more robust to the rotated or sheared images due to their better rotation invariance. Our work shows the potential of designing new convolutional kernels for CNNs, bringing up the prospect of expanding the search space of NAS with new variants of convolutions.
CVNov 15, 2019
Single Image Reflection Removal through Cascaded RefinementChao Li, Yixiao Yang, Kun He et al.
We address the problem of removing undesirable reflections from a single image captured through a glass surface, which is an ill-posed, challenging but practically important problem for photo enhancement. Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for reflection removal. IBCLN is a cascaded network that iteratively refines the estimates of transmission and reflection layers in a manner that they can boost the prediction quality to each other, and information across steps of the cascade is transferred using an LSTM. The intuition is that the transmission is the strong, dominant structure while the reflection is the weak, hidden structure. They are complementary to each other in a single image and thus a better estimate and reduction on one side from the original image leads to a more accurate estimate on the other side. To facilitate training over multiple cascade steps, we employ LSTM to address the vanishing gradient problem, and propose residual reconstruction loss as further training guidance. Besides, we create a dataset of real-world images with reflection and ground-truth transmission layers to mitigate the problem of insufficient data. Comprehensive experiments demonstrate that the proposed method can effectively remove reflections in real and synthetic images compared with state-of-the-art reflection removal methods.
CVSep 23, 2019
Robust Local Features for Improving the Generalization of Adversarial TrainingChuanbiao Song, Kun He, Jiadong Lin et al.
Adversarial training has been demonstrated as one of the most effective methods for training robust models to defend against adversarial examples. However, adversarially trained models often lack adversarially robust generalization on unseen testing data. Recent works show that adversarially trained models are more biased towards global structure features. Instead, in this work, we would like to investigate the relationship between the generalization of adversarial training and the robust local features, as the robust local features generalize well for unseen shape variation. To learn the robust local features, we develop a Random Block Shuffle (RBS) transformation to break up the global structure features on normal adversarial examples. We continue to propose a new approach called Robust Local Features for Adversarial Training (RLFAT), which first learns the robust local features by adversarial training on the RBS-transformed adversarial examples, and then transfers the robust local features into the training of normal adversarial examples. To demonstrate the generality of our argument, we implement RLFAT in currently state-of-the-art adversarial training frameworks. Extensive experiments on STL-10, CIFAR-10 and CIFAR-100 show that RLFAT significantly improves both the adversarially robust generalization and the standard generalization of adversarial training. Additionally, we demonstrate that our models capture more local features of the object on the images, aligning better with human perception.
LGAug 17, 2019
Nesterov Accelerated Gradient and Scale Invariance for Adversarial AttacksJiadong Lin, Chuanbiao Song, Kun He et al.
Deep learning models are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on benign inputs. However, under the black-box setting, most existing adversaries often have a poor transferability to attack other defense models. In this work, from the perspective of regarding the adversarial example generation as an optimization process, we propose two new methods to improve the transferability of adversarial examples, namely Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and Scale-Invariant attack Method (SIM). NI-FGSM aims to adapt Nesterov accelerated gradient into the iterative attacks so as to effectively look ahead and improve the transferability of adversarial examples. While SIM is based on our discovery on the scale-invariant property of deep learning models, for which we leverage to optimize the adversarial perturbations over the scale copies of the input images so as to avoid "overfitting" on the white-box model being attacked and generate more transferable adversarial examples. NI-FGSM and SIM can be naturally integrated to build a robust gradient-based attack to generate more transferable adversarial examples against the defense models. Empirical results on ImageNet dataset demonstrate that our attack methods exhibit higher transferability and achieve higher attack success rates than state-of-the-art gradient-based attacks.
IRMay 10, 2019
A New Anchor Word Selection Method for the Separable Topic DiscoveryKun He, Wu Wang, Xiaosen Wang et al.
Separable Non-negative Matrix Factorization (SNMF) is an important method for topic modeling, where "separable" assumes every topic contains at least one anchor word, defined as a word that has non-zero probability only on that topic. SNMF focuses on the word co-occurrence patterns to reveal topics by two steps: anchor word selection and topic recovery. The quality of the anchor words strongly influences the quality of the extracted topics. Existing anchor word selection algorithm is to greedily find an approximate convex hull in a high-dimensional word co-occurrence space. In this work, we propose a new method for the anchor word selection by associating the word co-occurrence probability with the words similarity and assuming that the most different words on semantic are potential candidates for the anchor words. Therefore, if the similarity of a word-pair is very low, then the two words are very likely to be the anchor words. According to the statistical information of text corpora, we can get the similarity of all word-pairs. We build the word similarity graph where the nodes correspond to words and weights on edges stand for the word-pair similarity. Following this way, we design a greedy method to find a minimum edge-weight anchor clique of a given size in the graph for the anchor word selection. Extensive experiments on real-world corpus demonstrate the effectiveness of the proposed anchor word selection method that outperforms the common convex hull-based methods on the revealed topic quality. Meanwhile, our method is much faster than typical SNMF based method.
CVApr 16, 2019
AT-GAN: An Adversarial Generator Model for Non-constrained Adversarial ExamplesXiaosen Wang, Kun He, Chuanbiao Song et al.
Despite the rapid development of adversarial machine learning, most adversarial attack and defense researches mainly focus on the perturbation-based adversarial examples, which is constrained by the input images. In comparison with existing works, we propose non-constrained adversarial examples, which are generated entirely from scratch without any constraint on the input. Unlike perturbation-based attacks, or the so-called unrestricted adversarial attack which is still constrained by the input noise, we aim to learn the distribution of adversarial examples to generate non-constrained but semantically meaningful adversarial examples. Following this spirit, we propose a novel attack framework called AT-GAN (Adversarial Transfer on Generative Adversarial Net). Specifically, we first develop a normal GAN model to learn the distribution of benign data, and then transfer the pre-trained GAN model to estimate the distribution of adversarial examples for the target model. In this way, AT-GAN can learn the distribution of adversarial examples that is very close to the distribution of real data. To our knowledge, this is the first work of building an adversarial generator model that could produce adversarial examples directly from any input noise. Extensive experiments and visualizations show that the proposed AT-GAN can very efficiently generate diverse adversarial examples that are more realistic to human perception. In addition, AT-GAN yields higher attack success rates against adversarially trained models under white-box attack setting and exhibits moderate transferability against black-box models.
DBNov 27, 2018
Adaptive Wavelet Clustering for Highly Noisy DataZengjian Chen, Jiayi Liu, Yihe Deng et al.
In this paper we make progress on the unsupervised task of mining arbitrarily shaped clusters in highly noisy datasets, which is a task present in many real-world applications. Based on the fundamental work that first applies a wavelet transform to data clustering, we propose an adaptive clustering algorithm, denoted as AdaWave, which exhibits favorable characteristics for clustering. By a self-adaptive thresholding technique, AdaWave is parameter free and can handle data in various situations. It is deterministic, fast in linear time, order-insensitive, shape-insensitive, robust to highly noisy data, and requires no pre-knowledge on data models. Moreover, AdaWave inherits the ability from the wavelet transform to cluster data in different resolutions. We adopt the "grid labeling" data structure to drastically reduce the memory consumption of the wavelet transform so that AdaWave can be used for relatively high dimensional data. Experiments on synthetic as well as natural datasets demonstrate the effectiveness and efficiency of our proposed method.
LGOct 1, 2018
Improving the Generalization of Adversarial Training with Domain AdaptationChuanbiao Song, Kun He, Liwei Wang et al.
By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples from other attacks. Moreover, during the adversarial training, adversarial perturbations on inputs are usually crafted by fast single-step adversaries so as to scale to large datasets. This work is mainly focused on the adversarial training yet efficient FGSM adversary. In this scenario, it is difficult to train a model with great generalization due to the lack of representative adversarial samples, aka the samples are unable to accurately reflect the adversarial domain. To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to regard the adversarial training on FGSM adversary as a domain adaption task with limited number of target domain samples. The main idea is to learn a representation that is semantically meaningful and domain invariant on the clean domain as well as the adversarial domain. Empirical evaluations on Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets. To show the transfer ability of our method, we also extend ATDA to the adversarial training on iterative attacks such as PGD-Adversial Training (PAT) and the defense performance is improved considerably.
LGJan 21, 2018
Curvature-based Comparison of Two Neural NetworksTao Yu, Huan Long, John E. Hopcroft
In this paper we show the similarities and differences of two deep neural networks by comparing the manifolds composed of activation vectors in each fully connected layer of them. The main contribution of this paper includes 1) a new data generating algorithm which is crucial for determining the dimension of manifolds; 2) a systematic strategy to compare manifolds. Especially, we take Riemann curvature and sectional curvature as part of criterion, which can reflect the intrinsic geometric properties of manifolds. Some interesting results and phenomenon are given, which help in specifying the similarities and differences between the features extracted by two networks and demystifying the intrinsic mechanism of deep neural networks.
CVNov 5, 2017
The Local Dimension of Deep ManifoldMengxiao Zhang, Wangquan Wu, Yanren Zhang et al.
Based on our observation that there exists a dramatic drop for the singular values of the fully connected layers or a single feature map of the convolutional layer, and that the dimension of the concatenated feature vector almost equals the summation of the dimension on each feature map, we propose a singular value decomposition (SVD) based approach to estimate the dimension of the deep manifolds for a typical convolutional neural network VGG19. We choose three categories from the ImageNet, namely Persian Cat, Container Ship and Volcano, and determine the local dimension of the deep manifolds of the deep layers through the tangent space of a target image. Through several augmentation methods, we found that the Gaussian noise method is closer to the intrinsic dimension, as by adding random noise to an image we are moving in an arbitrary dimension, and when the rank of the feature matrix of the augmented images does not increase we are very close to the local dimension of the manifold. We also estimate the dimension of the deep manifold based on the tangent space for each of the maxpooling layers. Our results show that the dimensions of different categories are close to each other and decline quickly along the convolutional layers and fully connected layers. Furthermore, we show that the dimensions decline quickly inside the Conv5 layer. Our work provides new insights for the intrinsic structure of deep neural networks and helps unveiling the inner organization of the black box of deep neural networks.
CVApr 2, 2017
Randomness in Deconvolutional Networks for Visual RepresentationKun He, Jingbo Wang, Haochuan Li et al.
Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture. For the random representations of an untrained CNN, we train the corresponding DCN to reconstruct the input images. Compared with the image inversion on pre-trained CNN, our training converges faster and the yielding network exhibits higher quality for image reconstruction. It indicates there is rich information encoded in the random features; the pre-trained CNN may discard information irrelevant for classification and encode relevant features in a way favorable for classification but harder for reconstruction. We further explore the property of the overall random CNN-DCN architecture. Surprisingly, images can be inverted with satisfactory quality. Extensive empirical evidence as well as theoretical analysis are provided.
LGApr 1, 2017
Snapshot Ensembles: Train 1, get M for freeGao Huang, Yixuan Li, Geoff Pleiss et al.
Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively.
SIFeb 24, 2017
Hidden Community Detection in Social NetworksKun He, Yingru Li, Sucheta Soundarajan et al.
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure. We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously. Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.
LGNov 19, 2015
Deep Manifold Traversal: Changing Labels with Convolutional FeaturesJacob R. Gardner, Paul Upchurch, Matt J. Kusner et al.
Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold traversal, a method that addresses the problem in its most general form: it first approximates the manifold of natural images then morphs a test image along a traversal path away from a source class and towards a target class while staying near the manifold throughout. The resulting algorithm is surprisingly effective and versatile. It is completely data driven, requiring only an example set of images from the desired source and target domains. We demonstrate deep manifold traversal on highly diverse label changing tasks: changing an individual's appearance (age and hair color), changing the season of an outdoor image, and transforming a city skyline towards nighttime.