LGNov 19, 2020

Robustness to Missing Features using Hierarchical Clustering with Split Neural Networks

Rishab Khincha, Utkarsh Sarawgi, Wazeer Zulfikar, Pattie Maes

arXiv:2011.09596v11.2Has Code

Originality Synthesis-oriented

AI Analysis

This work tackles the long-standing problem of missing data in machine learning, which is a common obstacle for practitioners and researchers, offering an incremental improvement.

This paper addresses the challenge of missing data by proposing a method that clusters similar input features using hierarchical clustering and then trains proportionately split neural networks with a joint loss. This approach demonstrates promising improvements on benchmark datasets even when using simple imputation techniques.

The problem of missing data has been persistent for a long time and poses a major obstacle in machine learning and statistical data analysis. Past works in this field have tried using various data imputation techniques to fill in the missing data, or training neural networks (NNs) with the missing data. In this work, we propose a simple yet effective approach that clusters similar input features together using hierarchical clustering and then trains proportionately split neural networks with a joint loss. We evaluate this approach on a series of benchmark datasets and show promising improvements even with simple imputation techniques. We attribute this to learning through clusters of similar features in our model architecture. The source code is available at https://github.com/usarawgi911/Robustness-to-Missing-Features

View on arXiv PDF Code

Similar