LG AIDec 18, 2022

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

Firas Khader, Gustav Mueller-Franzes, Tianci Wang, Tianyu Han, Soroosh Tayebi Arasteh, Christoph Haarburger, Johannes Stegmaier, Keno Bressem, Christiane Kuhl, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn

arXiv:2212.09162v26.99 citationsh-index: 59Has Code

Originality Highly original

AI Analysis

This addresses the challenge of scaling multimodal models for clinical routine data, enabling more accurate diagnosis in medical applications.

The paper tackles the scaling problem in multimodal deep learning for medical diagnosis by introducing 'learnable synergies' to select relevant interactions between data modalities, demonstrating improved performance on large radiology and ophthalmology datasets.

Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.

View on arXiv PDF Code

Similar