MLLGNov 6, 2018

Comparison of Discrete Choice Models and Artificial Neural Networks in Presence of Missing Variables

arXiv:1811.02284v11 citations
Originality Synthesis-oriented
AI Analysis

This work provides practical guidelines for selecting classifiers in real-world applications with missing data, but it is incremental as it compares existing methods without introducing new techniques.

The paper compared discrete choice models and artificial neural networks for binary classification when variables are missing, finding that neural networks generally outperform except in cases of highly unbalanced class distributions in training data.

Classification, the process of assigning a label (or class) to an observation given its features, is a common task in many applications. Nonetheless in most real-life applications, the labels can not be fully explained by the observed features. Indeed there can be many factors hidden to the modellers. The unexplained variation is then treated as some random noise which is handled differently depending on the method retained by the practitioner. This work focuses on two simple and widely used supervised classification algorithms: discrete choice models and artificial neural networks in the context of binary classification. Through various numerical experiments involving continuous or discrete explanatory features, we present a comparison of the retained methods' performance in presence of missing variables. The impact of the distribution of the two classes in the training data is also investigated. The outcomes of those experiments highlight the fact that artificial neural networks outperforms the discrete choice models, except when the distribution of the classes in the training data is highly unbalanced. Finally, this work provides some guidelines for choosing the right classifier with respect to the training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes