LGAIFeb 5, 2025

(GG) MoE vs. MLP on Tabular Data

arXiv:2502.03608v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and scalable models in tabular data analysis, offering a promising alternative to traditional MLPs, though it is incremental as it builds on existing MoE methods.

The paper tackled the problem of inefficient neural network architectures for tabular data by proposing GG MoE, a mixture-of-experts model with a Gumbel-Softmax gating function, which achieved the highest performance across 38 datasets compared to standard MoE and MLP models while using significantly fewer parameters.

In recent years, significant efforts have been directed toward adapting modern neural network architectures for tabular data. However, despite their larger number of parameters and longer training and inference times, these models often fail to consistently outperform vanilla multilayer perceptron (MLP) neural networks. Moreover, MLP-based ensembles have recently demonstrated superior performance and efficiency compared to advanced deep learning methods. Therefore, rather than focusing on building deeper and more complex deep learning models, we propose investigating whether MLP neural networks can be replaced with more efficient architectures without sacrificing performance. In this paper, we first introduce GG MoE, a mixture-of-experts (MoE) model with a Gumbel-Softmax gating function. We then demonstrate that GG MoE with an embedding layer achieves the highest performance across $38$ datasets compared to standard MoE and MLP models. Finally, we show that both MoE and GG MoE utilize significantly fewer parameters than MLPs, making them a promising alternative for scaling and ensemble methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes