MGA: Medical generalist agent through text-guided knowledge transformation
This work addresses the need for adaptable and less complex models in clinical applications, though it appears incremental as it builds on existing multi-modal methods with a novel guidance approach.
The paper tackles the problem of multi-modal medical representation methods requiring additional training branches for downstream tasks, which increases complexity and bias, by proposing MGA, a medical generalist agent that addresses three clinical tasks via text-guided knowledge transformation from clinical reports, achieving promising results on four X-ray datasets.
Multi-modal representation methods have achieved advanced performance in medical applications by extracting more robust features from multi-domain data. However, existing methods usually need to train additional branches for downstream tasks, which may increase the model complexities in clinical applications as well as introduce additional human inductive bias. Besides, very few studies exploit the rich clinical knowledge embedded in clinical daily reports. To this end, we propose a novel medical generalist agent, MGA, that can address three kinds of common clinical tasks via clinical reports knowledge transformation. Unlike the existing methods, MGA can easily adapt to different tasks without specific downstream branches when their corresponding annotations are missing. More importantly, we are the first attempt to use medical professional language guidance as a transmission medium to guide the agent's behavior. The proposed method is implemented on four well-known X-ray open-source datasets, MIMIC-CXR, CheXpert, MIMIC-CXR-JPG, and MIMIC-CXR-MS. Promising results are obtained, which validate the effectiveness of our proposed MGA. Code is available at: https://github.com/SZUHvern/MGA