LGJul 16, 2024

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets

arXiv:2407.11540v218 citationsh-index: 31Has Code
AI Analysis

This addresses a common challenge in training AI models on incomplete data, offering a novel approach that could benefit data scientists and practitioners, though it appears incremental as it builds on transformer architectures.

The paper tackles the problem of handling missing values in tabular datasets by introducing NAIM, a transformer-based model that avoids traditional imputation techniques, and it demonstrates superior performance over 11 state-of-the-art models across 5 datasets.

Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method" (NAIM), a novel transformer-based model specifically designed to address this issue without the need for traditional imputation techniques. NAIM's ability to avoid the necessity of imputing missing values and to effectively learn from available data relies on two main techniques: the use of feature-specific embeddings to encode both categorical and numerical features also handling missing inputs; the modification of the masked self-attention mechanism to completely mask out the contributions of missing data. Additionally, a novel regularization technique is introduced to enhance the model's generalization capability from incomplete data. We extensively evaluated NAIM on 5 publicly available tabular datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 5 deep learning models, each paired with 3 different imputation techniques when necessary. The results highlight the efficacy of NAIM in improving predictive performance and resilience in the presence of missing data. To facilitate further research and practical application in handling missing data without traditional imputation methods, we made the code for NAIM available at https://github.com/cosbidev/NAIM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes