SEDec 15, 2022
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability DetectionBenjamin Steenhoek, Hongyang Gao, Wei Le
Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs based on their root causes. In this paper, we propose to combine such causal-based vulnerability detection algorithms with deep learning, aiming to achieve more efficient and effective vulnerability detection. Specifically, we designed DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient. DeepDFA outperformed all non-transformer baselines. It was trained in 9 minutes, 75x faster than the highest-performing baseline model. When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset. DeepDFA also generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities. By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall. Our replication package is located at https://doi.org/10.6084/m9.figshare.21225413 .
LGSep 30, 2022
On the optimization and generalization of overparameterized implicit neural networksTianxiang Gao, Hongyang Gao
Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the global convergences for first-order methods such as gradient descent if the implicit networks are over-parameterized. However, as they train all layers together, their analyses are equivalent to only studying the evolution of the output layer. It is unclear how the implicit layer contributes to the training. Thus, in this paper, we restrict ourselves to only training the implicit layer. We show that global convergence is guaranteed, even if only the implicit layer is trained. On the other hand, the theoretical understanding of when and how the training performance of an implicit neural network can be generalized to unseen data is still under-explored. Although this problem has been studied in standard feed-forward networks, the case of implicit neural networks is still intriguing since implicit networks theoretically have infinitely many layers. Therefore, this paper investigates the generalization error for implicit neural networks. Specifically, we study the generalization of an implicit network activated by the ReLU function over random initialization. We provide a generalization bound that is initialization sensitive. As a result, we show that gradient flow with proper random initialization can train a sufficient over-parameterized implicit network to achieve arbitrarily small generalization errors.
LGMay 16, 2022
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear WidthsTianxiang Gao, Hongyang Gao
Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on \textit{nonlinear} implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have \textit{infinitely} many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a fixed scalar to scale the weight matrix of the implicit layer and show that there exists a small enough scaling constant, keeping the equilibrium equation well-posed throughout training. As a result, we prove that both GF and GD converge to a global minimum at a linear rate if the width $m$ of the implicit network is \textit{linear} in the sample size $N$, i.e., $m=Ω(N)$.
LGOct 16, 2023
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium ModelsTianxiang Gao, Xiaokai Huo, Hailiang Liu et al.
Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area.
LGMay 21, 2024
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph GenerationZhaoning Yu, Hongyang Gao
Graph Neural Networks (GNNs) have shown remarkable success in molecular tasks, yet their interpretability remains challenging. Traditional model-level explanation methods like XGNN and GNNInterpreter often fail to identify valid substructures like rings, leading to questionable interpretability. This limitation stems from XGNN's atom-by-atom approach and GNNInterpreter's reliance on average graph embeddings, which overlook the essential structural elements crucial for molecules. To address these gaps, we introduce an innovative \textbf{M}otif-b\textbf{A}sed \textbf{G}NN \textbf{E}xplainer (MAGE) that uses motifs as fundamental units for generating explanations. Our approach begins with extracting potential motifs through a motif decomposition technique. Then, we utilize an attention-based learning method to identify class-specific motifs. Finally, we employ a motif-based graph generator for each class to create molecular graph explanations based on these class-specific motifs. This novel method not only incorporates critical substructures into the explanations but also guarantees their validity, yielding results that are human-understandable. Our proposed method's effectiveness is demonstrated through quantitative and qualitative assessments conducted on six real-world molecular datasets.
LGSep 26, 2025
Global Convergence in Neural ODEs: Impact of Activation FunctionsTianxiang Gao, Siyuan Sun, Hailiang Liu et al.
Neural Ordinary Differential Equations (ODEs) have been successful in various applications due to their continuous nature and parameter-sharing efficiency. However, these unique characteristics also introduce challenges in training, particularly with respect to gradient computation accuracy and convergence analysis. In this paper, we address these challenges by investigating the impact of activation functions. We demonstrate that the properties of activation functions, specifically smoothness and nonlinearity, are critical to the training dynamics. Smooth activation functions guarantee globally unique solutions for both forward and backward ODEs, while sufficient nonlinearity is essential for maintaining the spectral properties of the Neural Tangent Kernel (NTK) during training. Together, these properties enable us to establish the global convergence of Neural ODEs under gradient descent in overparameterized regimes. Our theoretical findings are validated by numerical experiments, which not only support our analysis but also provide practical guidelines for scaling Neural ODEs, potentially leading to faster training and improved performance in real-world applications.
CLOct 2, 2025
RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-PenalizationZhaoning Yu, Will Su, Leitian Tao et al.
Reinforcement learning with human-annotated data has boosted chain-of-thought reasoning in large reasoning models, but these gains come at high costs in labeled data while faltering on harder tasks. A natural next step is experience-driven learning, where models improve without curated labels by adapting to unlabeled data. We introduce RESTRAIN (REinforcement learning with Self-restraint), a self-penalizing RL framework that converts the absence of gold labels into a useful learning signal. Instead of overcommitting to spurious majority votes, RESTRAIN exploits signals from the model's entire answer distribution: penalizing overconfident rollouts and low-consistency examples while preserving promising reasoning chains. The self-penalization mechanism integrates seamlessly into policy optimization methods such as GRPO, enabling continual self-improvement without supervision. On challenging reasoning benchmarks, RESTRAIN delivers large gains using only unlabeled data. With Qwen3-4B-Base and OctoThinker Hybrid-8B-Base, it improves Pass@1 by up to +140.7 percent on AIME25, +36.2 percent on MMLU_STEM, and +19.6 percent on GPQA-Diamond, nearly matching gold-label training while using no gold labels. These results demonstrate that RESTRAIN establishes a scalable path toward stronger reasoning without gold labels.
QMDec 24, 2023
MotifPiece: A Data-Driven Approach for Effective Motif Extraction and Molecular Representation LearningZhaoning Yu, Hongyang Gao
Motif extraction is an important task in motif based molecular representation learning. Previously, machine learning approaches employing either rule-based or string-based techniques to extract motifs. Rule-based approaches may extract motifs that aren't frequent or prevalent within the molecular data, which can lead to an incomplete understanding of essential structural patterns in molecules. String-based methods often lose the topological information inherent in molecules. This can be a significant drawback because topology plays a vital role in defining the spatial arrangement and connectivity of atoms within a molecule, which can be critical for understanding its properties and behavior. In this paper, we develop a data-driven motif extraction technique known as MotifPiece, which employs statistical measures to define motifs. To comprehensively evaluate the effectiveness of MotifPiece, we introduce a heterogeneous learning module. Our model shows an improvement compared to previously reported models. Additionally, we demonstrate that its performance can be further enhanced in two ways: first, by incorporating more data to aid in generating a richer motif vocabulary, and second, by merging multiple datasets that share enough motifs, allowing for cross-dataset learning.
LGOct 24, 2025
M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property PredictionXiangyang Xu, Hongyang Gao
Molecular property prediction (MPP) is a cornerstone of drug discovery and materials science, yet conventional deep learning approaches depend on large labeled datasets that are often unavailable. Few-shot Molecular property prediction (FSMPP) addresses this scarcity by incorporating relational inductive bias through a context graph that links molecule nodes to property nodes, but such molecule-property graphs offer limited structural guidance. We propose a comprehensive solution: Motif Driven Global-Local Context Graph for few-shot molecular property prediction, which enriches contextual information at both the global and local levels. At the global level, chemically meaningful motif nodes representing shared substructures, such as rings or functional groups, are introduced to form a global tri-partite heterogeneous graph, yielding motif-molecule-property connections that capture long-range compositional patterns and enable knowledge transfer among molecules with common motifs. At the local level, we build a subgraph for each node in the molecule-property pair and encode them separately to concentrate the model's attention on the most informative neighboring molecules and motifs. Experiments on five standard FSMPP benchmarks demonstrate that our framework consistently outperforms state-of-the-art methods. These results underscore the effectiveness of integrating global motif knowledge with fine-grained local context to advance robust few-shot molecular property prediction.
LGOct 6, 2025
Flow-Matching Based Refiner for Molecular Conformer GenerationXiangyang Xu, Hongyang Gao
Low-energy molecular conformers generation (MCG) is a foundational yet challenging problem in drug discovery. Denoising-based methods include diffusion and flow-matching methods that learn mappings from a simple base distribution to the molecular conformer distribution. However, these approaches often suffer from error accumulation during sampling, especially in the low SNR steps, which are hard to train. To address these challenges, we propose a flow-matching refiner for the MCG task. The proposed method initializes sampling from mixed-quality outputs produced by upstream denoising models and reschedules the noise scale to bypass the low-SNR phase, thereby improving sample quality. On the GEOM-QM9 and GEOM-Drugs benchmark datasets, the generator-refiner pipeline improves quality with fewer total denoising steps while preserving diversity.
SEJan 26, 2024
Inferring Data Preconditions from Deep Learning Models for Trustworthy Prediction in DeploymentShibbir Ahmed, Hongyang Gao, Hridesh Rajan
Deep learning models are trained with certain assumptions about the data during the development stage and then used for prediction in the deployment stage. It is important to reason about the trustworthiness of the model's predictions with unseen data during deployment. Existing methods for specifying and verifying traditional software are insufficient for this task, as they cannot handle the complexity of DNN model architecture and expected outcomes. In this work, we propose a novel technique that uses rules derived from neural network computations to infer data preconditions for a DNN model to determine the trustworthiness of its predictions. Our approach, DeepInfer involves introducing a novel abstraction for a trained DNN model that enables weakest precondition reasoning using Dijkstra's Predicate Transformer Semantics. By deriving rules over the inductive type of neural network abstract representation, we can overcome the matrix dimensionality issues that arise from the backward non-linear computation from the output layer to the input layer. We utilize the weakest precondition computation using rules of each kind of activation function to compute layer-wise precondition from the given postcondition on the final output of a deep neural network. We extensively evaluated DeepInfer on 29 real-world DNN models using four different datasets collected from five different sources and demonstrated the utility, effectiveness, and performance improvement over closely related work. DeepInfer efficiently detects correct and incorrect predictions of high-accuracy models with high recall (0.98) and high F-1 score (0.84) and has significantly improved over prior technique, SelfChecker. The average runtime overhead of DeepInfer is low, 0.22 sec for all unseen datasets. We also compared runtime overhead using the same hardware settings and found that DeepInfer is 3.27 times faster than SelfChecker.
LGFeb 1, 2022
Molecular Representation Learning via Heterogeneous Motif Graph Neural NetworksZhaoning Yu, Hongyang Gao
We consider feature representation learning problem of molecular graphs. Graph Neural Networks have been widely used in feature representation learning of molecular graphs. However, most existing methods deal with molecular graphs individually while neglecting their connections, such as motif-level relationships. We propose a novel molecular graph representation learning method by constructing a heterogeneous motif graph to address this issue. In particular, we build a heterogeneous motif graph that contains motif nodes and molecular nodes. Each motif node corresponds to a motif extracted from molecules. Then, we propose a Heterogeneous Motif Graph Neural Network (HM-GNN) to learn feature representations for each node in the heterogeneous motif graph. Our heterogeneous motif graph also enables effective multi-task learning, especially for small molecular datasets. To address the potential efficiency issue, we propose to use an edge sampler, which can significantly reduce computational resources usage. The experimental results show that our model consistently outperforms previous state-of-the-art models. Under multi-task settings, the promising performances of our methods on combined datasets shed light on a new learning paradigm for small molecular datasets. Finally, we show that our model achieves similar performances with significantly less computational resources by using our edge sampler.
LGFeb 1, 2022
MotifExplainer: a Motif-based Graph Neural Network ExplainerZhaoning Yu, Hongyang Gao
We consider the explanation problem of Graph Neural Networks (GNNs). Most existing GNN explanation methods identify the most important edges or nodes but fail to consider substructures, which are more important for graph data. The only method that considers subgraphs tries to search all possible subgraphs and identify the most significant subgraphs. However, the subgraphs identified may not be recurrent or statistically important. In this work, we propose a novel method, known as MotifExplainer, to explain GNNs by identifying important motifs, recurrent and statistically significant patterns in graphs. Our proposed motif-based methods can provide better human-understandable explanations than methods based on nodes, edges, and regular subgraphs. Given an input graph and a pre-trained GNN model, our method first extracts motifs in the graph using well-designed motif extraction rules. Then we generate motif embedding by feeding motifs into the pre-trained GNN. Finally, we employ an attention-based method to identify the most influential motifs as explanations for the final prediction results. The empirical studies on both synthetic and real-world datasets demonstrate the effectiveness of our method.
LGOct 11, 2021
A global convergence theory for deep ReLU implicit networks via over-parameterizationTianxiang Gao, Hailiang Liu, Jia Liu et al.
Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the solution of an equilibrium equation. Although a line of recent empirical studies has demonstrated its superior performances, the theoretical understanding of implicit neural networks is limited. In general, the equilibrium equation may not be well-posed during the training. As a result, there is no guarantee that a vanilla (stochastic) gradient descent (SGD) training nonlinear implicit neural networks can converge. This paper fills the gap by analyzing the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks. For an $m$-width implicit neural network with ReLU activation and $n$ training samples, we show that a randomly initialized gradient descent converges to a global minimum at a linear rate for the square loss function if the implicit neural network is \textit{over-parameterized}. It is worth noting that, unlike existing works on the convergence of (S)GD on finite-layer over-parameterized neural networks, our convergence results hold for implicit neural networks, where the number of layers is \textit{infinite}.
CLMar 15, 2021
Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for Two-Dimensional SentenceHongyang Gao, Yi Liu, Xuan Zhang et al.
We study text representation methods using deep models. Current methods, such as word-level embedding and character-level embedding schemes, treat texts as either a sequence of atomic words or a sequence of characters. These methods either ignore word morphologies or word boundaries. To overcome these limitations, we propose to convert texts into 2-D representations and develop the Sent2Matrix method. Our method allows for the explicit incorporation of both word morphologies and boundaries. When coupled with a novel serpentine padding method, our Sent2Matrix method leads to an interesting visualization in which 1-D character sequences are folded into 2-D serpentine manifolds. Notably, our method is the first attempt to represent texts in 2-D formats. Experimental results on text classification tasks shown that our method consistently outperforms prior embedding methods.
LGOct 19, 2020
Topology-Aware Graph Pooling NetworksHongyang Gao, Yi Liu, Shuiwang Ji
Pooling operations have shown to be effective on computer vision and natural language processing tasks. One challenge of performing pooling operations on graph data is the lack of locality that is not well-defined on graphs. Previous studies used global ranking methods to sample some of the important nodes, but most of them are not able to incorporate graph topology. In this work, we propose the topology-aware pooling (TAP) layer that explicitly considers graph topology. Our TAP layer is a two-stage voting process that selects more important nodes in a graph. It first performs local voting to generate scores for each node by attending each node to its neighboring nodes. The scores are generated locally such that topology information is explicitly considered. In addition, graph topology is incorporated in global voting to compute the importance score of each node globally in the entire graph. Altogether, the final ranking score for each node is computed by combining its local and global voting scores. To encourage better graph connectivity in the sampled graph, we propose to add a graph connectivity term to the computation of ranking scores. Results on graph classification tasks demonstrate that our methods achieve consistently better performance than previous methods.
LGJul 18, 2020
Towards Deeper Graph Neural NetworksMeng Liu, Hongyang Gao, Shuiwang Ji
Graph neural networks have shown significant success in the field of graph representation learning. Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations. Nevertheless, one layer of these neighborhood aggregation methods only consider immediate neighbors, and the performance decreases when going deeper to enable larger receptive fields. Several recent studies attribute this performance deterioration to the over-smoothing issue, which states that repeated propagation makes node representations of different classes indistinguishable. In this work, we study this observation systematically and develop new insights towards deeper graph neural networks. First, we provide a systematical analysis on this issue and argue that the key factor compromising the performance significantly is the entanglement of representation transformation and propagation in current graph convolution operations. After decoupling these two operations, deeper graph neural networks can be used to learn graph node representations from larger receptive fields. We further provide a theoretical analysis of the above observation when building very deep models, which can serve as a rigorous and gentle description of the over-smoothing issue. Based on our theoretical and empirical analysis, we propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields. A set of experiments on citation, co-authorship, and co-purchase datasets have confirmed our analysis and insights and demonstrated the superiority of our proposed methods.
CVJul 16, 2020
Kronecker Attention NetworksHongyang Gao, Zhengyang Wang, Shuiwang Ji
Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.
LGJul 5, 2019
Graph Representation Learning via Hard and Channel-Wise Attention NetworksHongyang Gao, Shuiwang Ji
Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In addition, GAOs belong to the family of soft attention, instead of hard attention, which has been shown to yield better performance. In this work, we propose novel hard graph attention operator (hGAO) and channel-wise graph attention operator (cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Experimental results demonstrate that our proposed deep models with the new operators achieve consistently better performance. Comparison results also indicates that hGAO achieves significantly better performance than GAO on both node and graph embedding tasks. Efficiency comparison shows that our cGAO leads to dramatic savings in computational resources, making them applicable to large graphs.
LGMay 11, 2019
Graph U-NetsHongyang Gao, Shuiwang Ji
We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets. Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models.
AIJan 21, 2019
Learning Graph Pooling and Hybrid Convolutional Operations for Text RepresentationsHongyang Gao, Yongjun Chen, Shuiwang Ji
With the development of graph convolutional networks (GCN), deep learning methods have started to be used on graph data. In additional to convolutional layers, pooling layers are another important components of deep learning. However, no effective pooling methods have been developed for graphs currently. In this work, we propose the graph pooling (gPool) layer, which employs a trainable projection vector to measure the importance of nodes in graphs. By selecting the k-most important nodes to form the new graph, gPool achieves the same objective as regular max pooling layers operating on images. Another limitation of GCN when used on graph-based text representation tasks is that, GCNs do not consider the order information of nodes in graph. To address this limitation, we propose the hybrid convolutional (hConv) layer that combines GCN and regular convolutional operations. The hConv layer is capable of increasing receptive fields quickly and computing features automatically. Based on the proposed gPool and hConv layers, we develop new deep networks for text categorization tasks. Our results show that the networks based on gPool and hConv layers achieves new state-of-the-art performance as compared to baseline methods.
CVSep 5, 2018
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise ConvolutionsHongyang Gao, Zhengyang Wang, Shuiwang Ji
Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks. However, the increasing model size has raised challenges in employing them in resource-limited applications. In this work, we propose to compress deep models by using channel-wise convolutions, which re- place dense connections among feature maps with sparse ones in CNNs. Based on this novel operation, we build light-weight CNNs known as ChannelNets. Channel- Nets use three instances of channel-wise convolutions; namely group channel-wise convolutions, depth-wise separable channel-wise convolutions, and the convolu- tional classification layer. Compared to prior CNNs designed for mobile devices, ChannelNets achieve a significant reduction in terms of the number of parameters and computational cost without loss in accuracy. Notably, our work represents the first attempt to compress the fully-connected classification layer, which usually accounts for about 25% of total parameters in compact CNNs. Experimental results on the ImageNet dataset demonstrate that ChannelNets achieve consistently better performance compared to prior methods.
LGAug 12, 2018
Large-Scale Learnable Graph Convolutional NetworksHongyang Gao, Zhengyang Wang, Shuiwang Ji
Convolutional neural networks (CNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In CNNs, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighboring units is neither fixed nor are they ordered in generic graphs, thereby hindering the applications of convolutional operations. Here, we address these challenges by proposing the learnable graph convolutional layer (LGCL). LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs. To enable model training on large-scale graphs, we propose a sub-graph training method to reduce the excessive memory and computational resource requirements suffered by prior methods on graph convolutions. Our experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that our methods can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network datasets. Our results also indicate that the proposed methods using sub-graph training strategy are more efficient as compared to prior approaches.
CVNov 24, 2017
Efficient and Invariant Convolutional Neural Networks for Dense PredictionHongyang Gao, Shuiwang Ji
Convolutional neural networks have shown great success on feature extraction from raw input data such as images. Although convolutional neural networks are invariant to translations on the inputs, they are not invariant to other transformations, including rotation and flip. Recent attempts have been made to incorporate more invariance in image recognition applications, but they are not applicable to dense prediction tasks, such as image segmentation. In this paper, we propose a set of methods based on kernel rotation and flip to enable rotation and flip invariance in convolutional neural networks. The kernel rotation can be achieved on kernels of 3 $\times$ 3, while kernel flip can be applied on kernels of any size. By rotating in eight or four angles, the convolutional layers could produce the corresponding number of feature maps based on eight or four different kernels. By using flip, the convolution layer can produce three feature maps. By combining produced feature maps using maxout, the resource requirement could be significantly reduced while still retain the invariance properties. Experimental results demonstrate that the proposed methods can achieve various invariance at reasonable resource requirements in terms of both memory and time.
CVMay 19, 2017
Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image GenerationLei Cai, Hongyang Gao, Shuiwang Ji
Variational auto-encoder (VAE) is a powerful unsupervised learning framework for image generation. One drawback of VAE is that it generates blurry images due to its Gaussianity assumption and thus L2 loss. To allow the generation of high quality images by VAE, we increase the capacity of decoder network by employing residual blocks and skip connections, which also enable efficient optimization. To overcome the limitation of L2 loss, we propose to generate images in a multi-stage manner from coarse to fine. In the simplest case, the proposed multi-stage VAE divides the decoder into two components in which the second component generates refined images based on the course images generated by the first component. Since the second component is independent of the VAE model, it can employ other loss functions beyond the L2 loss and different model architectures. The proposed framework can be easily generalized to contain more than two components. Experiment results on the MNIST and CelebA datasets demonstrate that the proposed multi-stage VAE can generate sharper images as compared to those from the original VAE.
LGMay 18, 2017
Pixel Deconvolutional NetworksHongyang Gao, Hao Yuan, Zhengyang Wang et al.
Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular deconvolution operation. The resulting PixelDCL can be used to replace any deconvolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelDCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelDCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than deconvolutional layers. When used in image generation tasks, our PixelDCL can largely overcome the checkerboard problem suffered by regular deconvolution operations.