Wen-Liang Hwang

LG
8papers
7citations
Novelty34%
AI Score21

8 Papers

LGJun 13, 2022
Analysis of function approximation and stability of general DNNs in directed acyclic graphs using un-rectifying analysis

Wen-Liang Hwang, Shih-Shuo Tung

A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformations, and DNN architectures in order to use the un-rectifying method to analyze DNNs via directed acyclic graphs (DAGs). DNNs that satisfy these assumptions are referred to as general DNNs. Our construction of an analytic graph was based on an axiomatic method in which DAGs are built from the bottom-up through the application of atomic operations to basic elements in accordance with regulatory rules. This approach allows us to derive the properties of general DNNs via mathematical induction. We show that using the proposed approach, some properties hold true for general DNNs can be derived. This analysis advances our understanding of network functions and could promote further theoretical insights if the host of analytical tools for graphs can be leveraged.

LGJun 16, 2023
Representation and decomposition of functions in DAG-DNNs and structural network pruning

Wen-Liang Hwang

The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the input node and the node of interest. In the current study, we demonstrate that DAG-DNNs can be used to derive all functions defined on various sub-architectures of the DNN. We also demonstrate that the functions defined in a DAG-DNN can be derived via a sequence of lower-triangular matrices, each of which provides the transition of functions defined in sub-graphs up to nodes at a specified level. The lifting structure associated with lower-triangular matrices makes it possible to perform the structural pruning of a network in a systematic manner. The fact that decomposition is universally applicable to all DNNs means that network pruning could theoretically be applied to any DNN, regardless of the underlying architecture. We demonstrate that it is possible to obtain the winning ticket (sub-network and initialization) for a weak version of the lottery ticket hypothesis, based on the fact that the sub-network with initialization can achieve training performance on par with that of the original network using the same number of iterations or fewer.

LGJul 31, 2024
Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight

Wen-Liang Hwang

In large-scale learning algorithms, the momentum term is usually included in the stochastic sub-gradient method to improve the learning speed because it can navigate ravines efficiently to reach a local minimum. However, step-size and momentum weight hyper-parameters must be appropriately tuned to optimize convergence. We thus analyze the convergence rate using stochastic programming with Polyak's acceleration of two commonly used step-size learning rates: ``diminishing-to-zero" and ``constant-and-drop" (where the sequence is divided into stages and a constant step-size is applied at each stage) under strongly convex functions over a compact convex set with bounded sub-gradients. For the former, we show that the convergence rate can be written as a product of exponential in step-size and polynomial in momentum weight. Our analysis justifies the convergence of using the default momentum weight setting and the diminishing-to-zero step-size sequence in large-scale machine learning software. For the latter, we present the condition for the momentum weight sequence to converge at each stage.

MLJul 29, 2024
Generalization bounds for regression and classification on adaptive covering input domains

Wen-Liang Hwang

Our main focus is on the generalization bound, which serves as an upper limit for the generalization error. Our analysis delves into regression and classification tasks separately to ensure a thorough examination. We assume the target function is real-valued and Lipschitz continuous for regression tasks. We use the 2-norm and a root-mean-square-error (RMSE) variant to measure the disparities between predictions and actual values. In the case of classification tasks, we treat the target function as a one-hot classifier, representing a piece-wise constant function, and employ 0/1 loss for error measurement. Our analysis underscores the differing sample complexity required to achieve a concentration inequality of generalization bounds, highlighting the variation in learning efficiency for regression and classification tasks. Furthermore, we demonstrate that the generalization bounds for regression and classification functions are inversely proportional to a polynomial of the number of parameters in a network, with the degree depending on the hypothesis class and the network architecture. These findings emphasize the advantages of over-parameterized networks and elucidate the conditions for benign overfitting in such systems.

LGJan 18, 2021
Learning DNN networks using un-rectifying ReLU with compressed sensing application

Wen-Liang Hwang, Shih-Shuo Tung

The un-rectifying technique expresses a non-linear point-wise activation function as a data-dependent variable, which means that the activation variable along with its input and output can all be employed in optimization. The ReLU network in this study was un-rectified means that the activation functions could be replaced with data-dependent activation variables in the form of equations and constraints. The discrete nature of activation variables associated with un-rectifying ReLUs allows the reformulation of deep learning problems as problems of combinatorial optimization. However, we demonstrate that the optimal solution to a combinatorial optimization problem can be preserved by relaxing the discrete domains of activation variables to closed intervals. This makes it easier to learn a network using methods developed for real-domain constrained optimization. We also demonstrate that by introducing data-dependent slack variables as constraints, it is possible to optimize a network based on the augmented Lagrangian approach. This means that our method could theoretically achieve global convergence and all limit points are critical points of the learning problem. In experiments, our novel approach to solving the compressed sensing recovery problem achieved state-of-the-art performance when applied to the MNIST database and natural images.

LGMar 29, 2019
Deep Representation with ReLU Neural Networks

Andreas Heinecke, Wen-Liang Hwang

We consider deep feedforward neural networks with rectified linear units from a signal processing perspective. In this view, such representations mark the transition from using a single (data-driven) linear representation to utilizing a large collection of affine linear representations tailored to particular regions of the signal space. This paper provides a precise description of the individual affine linear representations and corresponding domain regions that the (data-driven) neural network associates to each signal of the input space. In particular, we describe atomic decompositions of the representations and, based on estimating their Lipschitz regularity, suggest some conditions that can stabilize learning independent of the network depth. Such an analysis may promote further theoretical insight from both the signal processing and machine learning communities.

SPJan 6, 2018
Frame-based Sparse Analysis and Synthesis Signal Representations and Parseval K-SVD

Wen-Liang Hwang, Ping-Tzan Huang, Tai-Lang Jong

Frames are the foundation of the linear operators used in the decomposition and reconstruction of signals, such as the discrete Fourier transform, Gabor, wavelets, and curvelet transforms. The emergence of sparse representation models has shifted of the emphasis in frame theory toward sparse l1-minimization problems. In this paper, we apply frame theory to the sparse representation of signals in which a synthesis dictionary is used for a frame and an analysis dictionary is used for a dual frame. We sought to formulate a novel dual frame design in which the sparse vector obtained through the decomposition of any signal is also the sparse solution representing signals based on a reconstruction frame. Our findings demonstrate that this type of dual frame cannot be constructed for over-complete frames, thereby precluding the use of any linear analysis operator in driving the sparse synthesis coefficient for signal representation. Nonetheless, the best approximation to the sparse synthesis solution can be derived from the analysis coefficient using the canonical dual frame. In this study, we developed a novel dictionary learning algorithm (called Parseval K-SVD) to learn a tight-frame dictionary. We then leveraged the analysis and synthesis perspectives of signal representation with frames to derive optimization formulations for problems pertaining to image recovery. Our preliminary, results demonstrate that the images recovered using this approach are correlated to the frame bounds of dictionaries, thereby demonstrating the importance of using different dictionaries for different applications.

CVDec 14, 2015
Sparse Representation of a Blur Kernel for Blind Image Restoration

Chia-Chen Lee, Wen-Liang Hwang

Blind image restoration is a non-convex problem which involves restoration of images from an unknown blur kernel. The factors affecting the performance of this restoration are how much prior information about an image and a blur kernel are provided and what algorithm is used to perform the restoration task. Prior information on images is often employed to restore the sharpness of the edges of an image. By contrast, no consensus is still present regarding what prior information to use in restoring from a blur kernel due to complex image blurring processes. In this paper, we propose modelling of a blur kernel as a sparse linear combinations of basic 2-D patterns. Our approach has a competitive edge over the existing blur kernel modelling methods because our method has the flexibility to customize the dictionary design, which makes it well-adaptive to a variety of applications. As a demonstration, we construct a dictionary formed by basic patterns derived from the Kronecker product of Gaussian sequences. We also compare our results with those derived by other state-of-the-art methods, in terms of peak signal to noise ratio (PSNR).