IVAug 26, 2025
MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia PredictionPardis Moradbeiki, Nasser Ghadiri, Sayed Jalal Zahabi et al.
Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.
NCFeb 9, 2021
Modeling Visual Hallucination: A Generative Adversarial Network FrameworkMasoumeh Zareh, Mohammad Hossein Manshaei, Sayed Jalal Zahabi et al.
Visual hallucination refers to the perception of recognizable things that are not present. These phenomena are commonly linked to a range of neurological/psychiatric disorders. Despite ongoing research, the mechanisms through which the visual system generates hallucinations from real-world environments are still not well understood. Abnormal interactions between different regions of the brain responsible for perception are known to contribute to the occurrence of visual hallucinations. In this study, we propose and extend a generative neural network-based framework to address challenges within the visual system, aiming to create goal-driven models inspired by neurobiological mechanisms of visual hallucinations. We focus on the adversarial interactions between the visual system and the frontal lobe regions, proposing the Hallu-GAN model to suggest how these interactions can give rise to visual hallucinations. The architecture of the Hallu-GAN model is based on generative adversarial networks. Our simulation results indicate that disturbances in the ventral stream can lead to visual hallucinations. To further analyze the impact of other brain regions on the visual system, we extend the Hallu-GAN model by adding EEG data from individuals. This extended model, referred to as Hallu-GAN+, enables the examination of both hallucinating and non-hallucinating states. By training the Hallu-GAN+ model with EEG data from an individual with Charles Bonnet syndrome, we demonstrated its utility in analyzing the behavior of those experiencing hallucinations. Our simulation results confirmed the capability of the proposed model in resembling the visual system in both healthy and hallucinating states.
CLMay 11, 2020
Evaluating Sparse Interpretable Word Embeddings for Biomedical DomainMohammad Amin Samadi, Mohammad Sadegh Akhondzadeh, Sayed Jalal Zahabi et al.
Word embeddings have found their way into a wide range of natural language processing tasks including those in the biomedical domain. While these vector representations successfully capture semantic and syntactic word relations, hidden patterns and trends in the data, they fail to offer interpretability. Interpretability is a key means to justification which is an integral part when it comes to biomedical applications. We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods. Qualitative and quantitative measurements and metrics for interpretability of word vector representations are provided. For the quantitative evaluation, we introduce an extensive categorized dataset that can be used to quantify interpretability based on category theory. Intrinsic and extrinsic evaluation of the studied methods are also presented. As for the latter, we propose datasets which can be utilized for effective extrinsic evaluation of word vectors in the biomedical domain. Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.