NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications
This introduces a new foundation operation for computer vision architectures, potentially impacting model design and efficiency, though it appears incremental as it builds on existing paradigms.
The paper tackles the problem of limited foundation operations in computer vision by proposing NeoCell, a novel operator based on patch-wise matrix multiplications, and shows that NeoNeXt models achieve competitive quality on ImageNet-1K classification.
Most of the computer vision architectures nowadays are built upon the well-known foundation operations: fully-connected layers, convolutions and multi-head self-attention blocks. In this paper we propose a novel foundation operation - NeoCell - which learns matrix patterns and performs patchwise matrix multiplications with the input data. The main advantages of the proposed operator are (1) simple implementation without need in operations like im2col, (2) low computational complexity (especially for large matrices) and (3) simple and flexible implementation of up-/down-sampling. We validate NeoNeXt family of models based on this operation on ImageNet-1K classification task and show that they achieve competitive quality.