ML LGJul 6, 2018

Multi-Task Learning with Incomplete Data for Healthcare

Xin J. Hunt, Saba Emrani, Ilknur Kaynar Kabul, Jorge Silva

arXiv:1807.02442v15.57 citations

Originality Incremental advance

AI Analysis

This addresses a practical issue in healthcare data analysis where missing values are common, but it is incremental as it builds on existing robust multi-task learning methods.

The paper tackles the problem of missing features in multi-task learning for healthcare by proposing plug-in covariance matrix estimators, showing effectiveness in predicting Alzheimer's disease progression with incomplete data.

Multi-task learning is a type of transfer learning that trains multiple tasks simultaneously and leverages the shared information between related tasks to improve the generalization performance. However, missing features in the input matrix is a much more difficult problem which needs to be carefully addressed. Removing records with missing values can significantly reduce the sample size, which is impractical for datasets with large percentage of missing values. Popular imputation methods often distort the covariance structure of the data, which causes inaccurate inference. In this paper we propose using plug-in covariance matrix estimators to tackle the challenge of missing features. Specifically, we analyze the plug-in estimators under the framework of robust multi-task learning with LASSO and graph regularization, which captures the relatedness between tasks via graph regularization. We use the Alzheimer's disease progression dataset as an example to show how the proposed framework is effective for prediction and model estimation when missing data is present.

View on arXiv PDF

Similar