Learning from data with structured missingness
This work identifies a critical issue for machine learning practitioners dealing with large-scale heterogeneous data, but it is incremental as it primarily reviews literature and proposes challenges without presenting new methods or results.
The paper addresses the problem of structured missingness in machine learning, where missing values exhibit associations, and outlines current literature and grand challenges to tackle this fundamental hindrance to learning at scale.
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such `structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here, we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.