LG ST MLOct 23, 2020

Learning from missing data with the Latent Block Model

Gabriel Frisch, Jean-Benoist Léger, Yves Grandvalet

arXiv:2010.12222v11.2

Originality Incremental advance

AI Analysis

This work addresses the issue of misleading conclusions from missing data in statistical modeling, particularly for applications like political analysis, though it appears incremental as it builds on existing Latent Block Model frameworks.

The paper tackles the problem of missing data that is informative (Missing Not At Random) by proposing a co-clustering model based on the Latent Block Model to extract information from nonresponses, and demonstrates its effectiveness through a simulation study and an analysis of French Parliament voting records, revealing relevant groups and interpretations of non-voter behavior.

Missing data can be informative. Ignoring this information can lead to misleading conclusions when the data model does not allow information to be extracted from the missing data. We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of this nonignorable nonresponses, also known as Missing Not At Random data (MNAR). A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented. We assess the proposed approach on a simulation study, before using our model on the voting records from the lower house of the French Parliament, where our analysis brings out relevant groups of MPs and texts, together with a sensible interpretation of the behavior of non-voters.

View on arXiv PDF

Similar