LG AI MLMay 28, 2019

Using Ontologies To Improve Performance In Massively Multi-label Prediction Models

arXiv:1905.12126v11.81 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of rare labels in domains like healthcare and biology, where precise predictions are crucial, though it is an incremental improvement over existing methods.

The paper tackled the problem of long-tailed label distributions in massively multi-label prediction by modifying a neural network's output layer to incorporate ontology relationships, resulting in significant improvements in AUROC and average precision for rare labels in disease and protein function prediction tasks.

Massively multi-label prediction/classification problems arise in environments like health-care or biology where very precise predictions are useful. One challenge with massively multi-label problems is that there is often a long-tailed frequency distribution for the labels, which results in few positive examples for the rare labels. We propose a solution to this problem by modifying the output layer of a neural network to create a Bayesian network of sigmoids which takes advantage of ontology relationships between the labels to help share information between the rare and the more common labels. We apply this method to the two massively multi-label tasks of disease prediction (ICD-9 codes) and protein function prediction (Gene Ontology terms) and obtain significant improvements in per-label AUROC and average precision for less common labels.

View on arXiv PDF

Similar