CVAILGMay 28, 2022

WaveMix: A Resource-efficient Neural Network for Image Analysis

arXiv:2205.14375v531 citationsh-index: 24
Originality Highly original
AI Analysis

This addresses the need for more efficient and scalable neural networks in computer vision, offering potential savings in time, cost, and energy for practitioners, though it is incremental in improving existing architectures.

The paper tackles the problem of resource inefficiency in neural networks for image analysis by proposing WaveMix, a novel architecture that uses multi-level 2D discrete wavelet transform to achieve comparable or better accuracy than state-of-the-art methods while using fewer parameters, GPU RAM, and computations, establishing new benchmarks in tasks like segmentation on Cityscapes and classification on datasets such as Places-365.

We propose a novel neural architecture for computer vision -- WaveMix -- that is resource-efficient and yet generalizable and scalable. While using fewer trainable parameters, GPU RAM, and computations, WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks. This efficiency can translate to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges -- (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. WaveMix establishes new benchmarks for segmentation on Cityscapes; and for classification on Galaxy 10 DECals, Places-365, five EMNIST datasets, and iNAT-mini and performs competitively on other benchmarks. Our code and trained models are publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes