Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
For clinical researchers and practitioners, this method improves multi-outcome prediction from multimodal data by reducing negative transfer and isolating task-specific signals.
The paper proposes a multi-task framework with Orthogonal Task Decomposition (OrthTD) to disentangle shared and task-specific representations from multimodal clinical data. On a cohort of 12,430 surgical patients predicting four outcomes, OrthTD achieved average AUC of 87.5% and average AUPRC of 37.2%, outperforming advanced baselines.
Real-world clinical data is inherently multimodal, providing complementary evidence that mirrors the practical necessity of jointly assessing multiple related outcomes. Although multi-task learning can improve efficiency by sharing information across outcomes, existing approaches often fail to balance shared representation learning with outcome-specific modeling. Hard parameter sharing can trigger negative transfer when task gradients conflict, while flexible sharing may still entangle shared and task-specific signals. To address this, we propose a multi-task framework built on a unified Transformer for multimodal fusion, augmented with Orthogonal Task Decomposition (OrthTD) to split patient representations into shared and task-specific subspaces and impose a geometric orthogonality constraint to reduce redundancy and isolate task-specific signals. We evaluated OrthTD on a real-world cohort of 12,430 surgical patients for predicting four outcomes. OrthTD achieved average AUC (area under the receiver operating characteristic curve) of 87.5% and average AUPRC (area under the precision-recall curve) of 37.2%, consistently outperformed advanced tabular and multi-task methods. Notably, OrthTD achieves substantial gains in AUPRC, indicating superior performance in identifying rare events within imbalanced clinical data. These results suggest that enforcing non-redundant shared and task-specific representations can improve multi-outcome prediction from multimodal clinical data.