CVMay 7, 2021

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou

arXiv:2105.03404v240.3877 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses image classification for computer vision researchers, offering a simpler alternative to convolutional or transformer-based models, though it appears incremental as it builds on existing residual and MLP concepts.

The authors tackled image classification by introducing ResMLP, a feedforward network based solely on multi-layer perceptrons, achieving competitive accuracy-complexity trade-offs on ImageNet with data-efficient training and also showing promising results in self-supervised learning and machine translation.

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.

View on arXiv PDF Code

Similar