LGAug 26, 2025

Estimating Conditional Covariance between labels for Multilabel Data

arXiv:2508.18951v1

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable label dependence analysis in multilabel classification, though it is incremental as it compares existing models without introducing a new method.

The paper tackled the problem of estimating conditional covariance between labels in multilabel data to assess label dependence, finding that all three compared models (Multivariate Probit, Multivariate Bernoulli, and Staged Logit) performed similarly but falsely detected dependent covariance when constant covariance was present, with the Multivariate Probit model having the lowest error rate.

Multilabel data should be analysed for label dependence before applying multilabel models. Independence between multilabel data labels cannot be measured directly from the label values due to their dependence on the set of covariates $\vec{x}$, but can be measured by examining the conditional label covariance using a multivariate Probit model. Unfortunately, the multivariate Probit model provides an estimate of its copula covariance, and so might not be reliable in estimating constant covariance and dependent covariance. In this article, we compare three models (Multivariate Probit, Multivariate Bernoulli and Staged Logit) for estimating the constant and dependent multilabel conditional label covariance. We provide an experiment that allows us to observe each model's measurement of conditional covariance. We found that all models measure constant and dependent covariance equally well, depending on the strength of the covariance, but the models all falsely detect that dependent covariance is present for data where constant covariance is present. Of the three models, the Multivariate Probit model had the lowest error rate.

View on arXiv PDF

Similar