CVApr 8, 2023

MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Zhimin Zhu, Jianguo Zhao, Tong Mu, Yuliang Yang, Mengyu Zhu

arXiv:2304.03917v11.5h-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific problem in computer vision for researchers developing MLP-based architectures, representing an incremental improvement by enhancing existing MLP designs with multi-coordinate frame capabilities.

The paper tackles the challenge of varying learning difficulty for semantic information across different coordinate frames in vision MLPs by introducing MC-MLP, which uses orthogonal transforms to enable multi-coordinate frame receptive fields, resulting in improved performance in image classification tasks at the same parameter level compared to most MLPs.

In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered attention from researchers. This paper introduces MC-MLP, a general MLP-like backbone for computer vision that is composed of a series of fully-connected (FC) layers. In MC-MLP, we propose that the same semantic information has varying levels of difficulty in learning, depending on the coordinate frame of features. To address this, we perform an orthogonal transform on the feature information, equivalent to changing the coordinate frame of features. Through this design, MC-MLP is equipped with multi-coordinate frame receptive fields and the ability to learn information across different coordinate frames. Experiments demonstrate that MC-MLP outperforms most MLPs in image classification tasks, achieving better performance at the same parameter level. The code will be available at: https://github.com/ZZM11/MC-MLP.

View on arXiv PDF Code

Similar