LGAIJun 8, 2021

Muddling Label Regularization: Deep Learning for Tabular Datasets

arXiv:2106.04462v27 citations
AI Analysis

This provides a new deep learning solution for tabular data tasks, addressing a domain-specific bottleneck where deep learning was previously considered less effective.

The paper tackles the problem of applying deep learning to tabular datasets, where ensemble methods are traditionally preferred, by introducing Muddling Label Regularization (MLR), a method that penalizes memorization through uninformative labels and regularization, and it outperforms classical neural networks and ensemble methods like GBDT and RF on various UCI and Kaggle datasets.

Deep Learning (DL) is considered the state-of-the-art in computer vision, speech recognition and natural language processing. Until recently, it was also widely accepted that DL is irrelevant for learning tasks on tabular data, especially in the small sample regime where ensemble methods are acknowledged as the gold standard. We present a new end-to-end differentiable method to train a standard FFNN. Our method, \textbf{Muddling labels for Regularization} (\texttt{MLR}), penalizes memorization through the generation of uninformative labels and the application of a differentiable close-form regularization scheme on the last hidden layer during training. \texttt{MLR} outperforms classical NN and the gold standard (GBDT, RF) for regression and classification tasks on several datasets from the UCI database and Kaggle covering a large range of sample sizes and feature to sample ratios. Researchers and practitioners can use \texttt{MLR} on its own as an off-the-shelf \DL{} solution or integrate it into the most advanced ML pipelines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes