LGJun 16, 2023

Representation and decomposition of functions in DAG-DNNs and structural network pruning

arXiv:2306.09707v11 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses the need for universal methods to analyze and prune DNNs, which is incremental as it builds on existing pruning techniques like the lottery ticket hypothesis.

The study tackled the problem of understanding and pruning deep neural networks (DNNs) by representing them as DAG-DNNs, showing that functions can be decomposed via lower-triangular matrices, which enables systematic structural pruning applicable to any DNN architecture, and demonstrated that a sub-network with initialization can achieve training performance on par with the original network using the same or fewer iterations.

The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the input node and the node of interest. In the current study, we demonstrate that DAG-DNNs can be used to derive all functions defined on various sub-architectures of the DNN. We also demonstrate that the functions defined in a DAG-DNN can be derived via a sequence of lower-triangular matrices, each of which provides the transition of functions defined in sub-graphs up to nodes at a specified level. The lifting structure associated with lower-triangular matrices makes it possible to perform the structural pruning of a network in a systematic manner. The fact that decomposition is universally applicable to all DNNs means that network pruning could theoretically be applied to any DNN, regardless of the underlying architecture. We demonstrate that it is possible to obtain the winning ticket (sub-network and initialization) for a weak version of the lottery ticket hypothesis, based on the fact that the sub-network with initialization can achieve training performance on par with that of the original network using the same number of iterations or fewer.

View on arXiv PDF

Similar