Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment
This work addresses stroke treatment outcome prediction for clinicians, but it is incremental as it applies existing Transformer methods to multimodal medical data.
The study tackled predicting functional outcomes of stroke treatment by combining non-contrast CT images and discharge reports using a Transformer-based multimodal fusion framework, finding that multimodal combination outperformed single modalities.
This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of stroke treatment. The results show that the performance of single-modal text classification is significantly better than single-modal image classification, but the effect of multi-modal combination is better than any single modality. Although the Transformer model only performs worse on imaging data, when combined with clinical meta-diagnostic information, both can learn better complementary information and make good contributions to accurately predicting stroke treatment effects..