Jagath C. Rajapakse

CV
h-index16
18papers
139citations
Novelty44%
AI Score54

18 Papers

IVJul 31, 2023
Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer's Disease from sMRI and PET Scans

Yanteng Zhanga, Xiaohai He, Yi Hao Chan et al.

In recent years, deep learning models have been applied to neuroimaging data for early diagnosis of Alzheimer's disease (AD). Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET) images provide structural and functional information about the brain, respectively. Combining these features leads to improved performance than using a single modality alone in building predictive models for AD diagnosis. However, current multi-modal approaches in deep learning, based on sMRI and PET, are mostly limited to convolutional neural networks, which do not facilitate integration of both image and phenotypic information of subjects. We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains. In this study, we demonstrate how brain networks can be created from sMRI or PET images and be used in a population graph framework that can combine phenotypic information with imaging features of these brain networks. Then, we present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data at both the level of node vectors and adjacency matrices. Finally, we perform late fusion to combine the preliminary decisions made in each branch and produce a final prediction. As multi-modality data becomes available, multi-source and multi-modal is the trend of AD diagnosis. We conducted explorative experiments based on multi-modal imaging data combined with non-imaging phenotypic information for AD diagnosis and analyzed the impact of phenotypic information on diagnostic performance. Results from experiments demonstrated that our proposed multi-modal approach improves performance for AD diagnosis, and this study also provides technical reference and support the need for multivariate multi-modal diagnosis methods.

NCApr 23Code
Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity

Deepank Girish, Yi Hao Chan, Sukrit Gupta et al.

Several brain foundation models (FM) have recently been proposed to predict brain disorders by modelling dynamic functional connectivity (FC). While they demonstrate remarkable model performance and zero- or few-shot generalization, the salient features identified as potential biomarkers are yet to be thoroughly evaluated. We propose RE-CONFIRM, a framework for evaluating the robustness of potential biomarker candidates elucidated by deep learning (DL) models including FMs. From experiments on five large datasets of Autism Spectrum Disorder (ASD), Attention-deficit Hyperactivity Disorder (ADHD), and Alzheimer's Disease (AD), we found that although commonly used performance metrics provide an intuitive assessment of model predictions, they are insufficient for evaluating the robustness of biomarkers identified by these models. RE-CONFIRM metrics revealed that simply finetuning FMs leads to models that fail to capture regional hubs effectively, even in disorders where hubs are known to be implicated, such as ASD and ADHD. In view of this, we propose Hub-LoRA (Low-Rank Adaptation) as a fine-tuning technique that enables FMs to not only outperform customised DL models but also produce neurobiologically faithful biomarkers supported by meta-analyses. RE-CONFIRM is generalizable and can be easily applied to ascertain the robustness of DL models trained on functional MRI datasets. Code is available at: https://github.com/SCSE-Biomedical-Computing-Group/RE-CONFIRM.

CVAug 25, 2023
Self-supervised learning for hotspot detection and isolation from thermal images

Shreyas Goyal, Jagath C. Rajapakse

Hotspot detection using thermal imaging has recently become essential in several industrial applications, such as security applications, health applications, and equipment monitoring applications. Hotspot detection is of utmost importance in industrial safety where equipment can develop anomalies. Hotspots are early indicators of such anomalies. We address the problem of hotspot detection in thermal images by proposing a self-supervised learning approach. Self-supervised learning has shown potential as a competitive alternative to their supervised learning counterparts but their application to thermography has been limited. This has been due to lack of diverse data availability, domain specific pre-trained models, standardized benchmarks, etc. We propose a self-supervised representation learning approach followed by fine-tuning that improves detection of hotspots by classification. The SimSiam network based ensemble classifier decides whether an image contains hotspots or not. Detection of hotspots is followed by precise hotspot isolation. By doing so, we are able to provide a highly accurate and precise hotspot identification, applicable to a wide range of applications. We created a novel large thermal image dataset to address the issue of paucity of easily accessible thermal images. Our experiments with the dataset created by us and a publicly available segmentation dataset show the potential of our approach for hotspot detection and its ability to isolate hotspots with high accuracy. We achieve a Dice Coefficient of 0.736, the highest when compared with existing hotspot identification techniques. Our experiments also show self-supervised learning as a strong contender of supervised learning, providing competitive metrics for hotspot detection, with the highest accuracy of our approach being 97%.

CLOct 7, 2023
Integrating Contrastive Learning into a Multitask Transformer Model for Effective Domain Adaptation

Chung-Soo Ahn, Jagath C. Rajapakse, Rajib Rana

While speech emotion recognition (SER) research has made significant progress, achieving generalization across various corpora continues to pose a problem. We propose a novel domain adaptation technique that embodies a multitask framework with SER as the primary task, and contrastive learning and information maximisation loss as auxiliary tasks, underpinned by fine-tuning of transformers pre-trained on large language models. Empirical results obtained through experiments on well-established datasets like IEMOCAP and MSP-IMPROV, illustrate that our proposed model achieves state-of-the-art performance in SER within cross-corpus scenarios.

IVMay 17, 2025Code
Bridging the Inter-Domain Gap through Low-Level Features for Cross-Modal Medical Image Segmentation

Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu et al.

This paper addresses the task of cross-modal medical image segmentation by exploring unsupervised domain adaptation (UDA) approaches. We propose a model-agnostic UDA framework, LowBridge, which builds on a simple observation that cross-modal images share some similar low-level features (e.g., edges) as they are depicting the same structures. Specifically, we first train a generative model to recover the source images from their edge features, followed by training a segmentation model on the generated source images, separately. At test time, edge features from the target images are input to the pretrained generative model to generate source-style target domain images, which are then segmented using the pretrained segmentation network. Despite its simplicity, extensive experiments on various publicly available datasets demonstrate that \proposed achieves state-of-the-art performance, outperforming eleven existing UDA approaches under different settings. Notably, further ablation studies show that \proposed is agnostic to different types of generative and segmentation models, suggesting its potential to be seamlessly plugged with the most advanced models to achieve even more outstanding results in the future. The code is available at https://github.com/JoshuaLPF/LowBridge.

CVNov 6, 2024Code
Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection

Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu et al.

Autonomous aerial vehicle (AAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing AAV-based BSOD models limits their applicability to real-world AAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical filtering mechanism. Our proposed model, AlignSal, reduces the number of parameters by 70.0%, decreases the floating point operations by 49.4%, and increases the inference speed by 152.5% compared to the cutting-edge BSOD model (i.e., MROS). Extensive experiments on the AAV RGB-T 2400 and seven bi-modal dense prediction datasets demonstrate that AlignSal achieves both real-time inference speed and better performance and generalizability compared to nineteen state-of-the-art models across most evaluation metrics. In addition, our ablation studies further verify AlignSal's potential in boosting the performance of existing aligned BSOD models on AAV-based unaligned data. The code is available at: https://github.com/JoshuaLPF/AlignSal.

IVAug 15, 2025Code
Subcortical Masks Generation in CT Images via Ensemble-Based Cross-Domain Label Transfer

Augustine X. W. Lee, Pak-Hei Yeung, Jagath C. Rajapakse

Subcortical segmentation in neuroimages plays an important role in understanding brain anatomy and facilitating computer-aided diagnosis of traumatic brain injuries and neurodegenerative disorders. However, training accurate automatic models requires large amounts of labelled data. Despite the availability of publicly available subcortical segmentation datasets for Magnetic Resonance Imaging (MRI), a significant gap exists for Computed Tomography (CT). This paper proposes an automatic ensemble framework to generate high-quality subcortical segmentation labels for CT scans by leveraging existing MRI-based models. We introduce a robust ensembling pipeline to integrate them and apply it to unannotated paired MRI-CT data, resulting in a comprehensive CT subcortical segmentation dataset. Extensive experiments on multiple public datasets demonstrate the superior performance of our proposed framework. Furthermore, using our generated CT dataset, we train segmentation models that achieve improved performance on related segmentation tasks. To facilitate future research, we make our source code, generated dataset, and trained models publicly available at https://github.com/SCSE-Biomedical-Computing-Group/CT-Subcortical-Segmentation, marking the first open-source release for CT subcortical segmentation to the best of our knowledge.

CVNov 27, 2024Code
Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

Pengfei Lyu, Xiaosheng Yu, Pak-Hei Yeung et al.

The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing Transformer-based RGB-T SOD models with quadratic complexity are memory-intensive, limiting their application in high-resolution bimodal feature fusion. To overcome this limitation, we propose a purely Fourier Transform-based model, namely Deep Fourier-embedded Network (FreqSal), for accurate RGB-T SOD. Specifically, we leverage the efficiency of Fast Fourier Transform with linear complexity to design three key components: (1) To fuse RGB and thermal modalities, we propose Modal-coordinated Perception Attention, which aligns and enhances bimodal Fourier representation in multiple dimensions; (2) To clarify object edges and suppress noise, we design Frequency-decomposed Edge-aware Block, which deeply decomposes and filters Fourier components of low-level features; (3) To accurately decode features, we propose Fourier Residual Channel Attention Block, which prioritizes high-frequency information while aligning channel-wise global relationships. Additionally, even when converged, existing deep learning-based SOD models' predictions still exhibit frequency gaps relative to ground-truth. To address this problem, we propose Co-focus Frequency Loss, which dynamically weights hard frequencies during edge frequency reconstruction by cross-referencing bimodal edge information in the Fourier domain. Extensive experiments on ten bimodal SOD benchmark datasets demonstrate that FreqSal outperforms twenty-nine existing state-of-the-art bimodal SOD models. Comprehensive ablation studies further validate the value and effectiveness of our newly proposed components. The code is available at https://github.com/JoshuaLPF/FreqSal.

LGMay 9
Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta, Edward Tan Beng Wai, Soumick Sarker et al.

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $\&$ evolutionary signals are encoded in dense latent spaces. We propose a plug-$\&$-play framework that projects ESM-2 representations onto protein contact graphs $\&$ applies $\textbf{SoftBlobGIN}$, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing $\&$ learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy $\&$ 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues, spatially localized functional clusters, $\&$ catalytic contact patterns. On binding-site detection, SoftBlobGIN improves residue AUROC from $0.885$ using an ESM-2 linear probe to $0.983$, indicating that these structural explanations are not recoverable from language-model features alone. Learned blob partitions provide an additional layer of interpretability by automatically grouping residues into functional substructures, with blobs containing annotated active-site residues showing $1.85\times$ higher importance than other blobs ($ρ{=}0.339$, $p{=}0.009$), without any active-site supervision. Our framework requires no retraining of the language model, adds only $\sim$1.1M parameters, $\&$ generalises across ProteinShake tasks, achieving $F_{\max}$ of $0.733$ on Gene Ontology prediction $\&$ AUROC of $0.969$ on binding-site detection. We position this as an interpretable structural companion to protein language models that makes their predictions more transparent $\&$ auditable.

LGMay 1, 2024
Discovering robust biomarkers of psychiatric disorders from resting-state functional MRI via graph neural networks: A systematic review

Yi Hao Chan, Deepank Girish, Sukrit Gupta et al.

Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. However, existing methods of evaluating their robustness are often limited to cross-referencing with existing literature, which is a subjective and inconsistent process. In this review, we provide an overview of how GNN and model explainability techniques (specifically, feature attributors) have been applied to fMRI datasets for disorder prediction tasks, with an emphasis on evaluating the robustness of potential biomarkers produced for psychiatric disorders. Then, 65 studies using GNNs that reported potential fMRI biomarkers for psychiatric disorders (attention-deficit hyperactivity disorder, autism spectrum disorder, major depressive disorder, schizophrenia) published before 9 October 2024 were identified from 2 online databases (Scopus, PubMed). We found that while most studies have performant models, salient features highlighted in these studies (as determined by feature attribution scores) vary greatly across studies on the same disorder. Reproducibility of biomarkers is only limited to a small subset at the level of regions and few transdiagnostic biomarkers were identified. To address these issues, we suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of these potential biomarkers. We further highlight gaps in the existing literature and put together a prediction-attribution-evaluation framework that could set the foundations for future research on discovering robust biomarkers of psychiatric disorders via GNNs.

BMDec 10, 2024
Pharmacophore-guided de novo drug design with diffusion bridge

Conghao Wang, Jagath C. Rajapakse

De novo design of bioactive drug molecules with potential to treat desired biological targets is a profound task in the drug discovery process. Existing approaches tend to leverage the pocket structure of the target protein to condition the molecule generation. However, even the pocket area of the target protein may contain redundant information since not all atoms in the pocket is responsible for the interaction with the ligand. In this work, we propose PharmacoBridge, a phamacophore-guided de novo design approach to generate drug candidates inducing desired bioactivity via diffusion bridge. Our method adapts the diffusion bridge to effectively convert pharmacophore arrangements in the spatial space into molecular structures under the manner of SE(3)-equivariant transformation, providing sophisticated control over optimal biochemical feature arrangements on the generated molecules. PharmacoBridge is demonstrated to generate hit candidates that exhibit high binding affinity with potential protein targets.

CVOct 21, 2025
FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning

Yubin Zheng, Pak-Hei Yeung, Jing Xia et al.

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without exposing local data, balancing performance and privacy. However, domain shift and label heterogeneity across clients often hinder the generalization of the aggregated global model. Recently, large-scale vision-language models like CLIP have shown strong zero-shot classification capabilities, raising the question of how to effectively fine-tune CLIP across domains in a federated setting. In this work, we propose an adaptive federated prompt tuning framework, FedDEAP, to enhance CLIP's generalization in multi-domain scenarios. Our method includes the following three key components: (1) To mitigate the loss of domain-specific information caused by label-supervised tuning, we disentangle semantic and domain-specific features in images by using semantic and domain transformation networks with unbiased mappings; (2) To preserve domain-specific knowledge during global prompt aggregation, we introduce a dual-prompt design with a global semantic prompt and a local domain prompt to balance shared and personalized information; (3) To maximize the inclusion of semantic and domain information from images in the generated text features, we align textual and visual representations under the two learned transformations to preserve semantic and domain consistency. Theoretical analysis and extensive experiments on four datasets demonstrate the effectiveness of our method in enhancing the generalization of CLIP for federated image recognition across multiple domains.

SDOct 11, 2025
Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model

Chung-Soo Ahn, Rajib Rana, Sunil Sivadas et al.

Although speech emotion recognition (SER) research has been advanced, thanks to deep learning methods, it still suffers from obtaining inputs from large quality-labelled training data. Data augmentation methods have been attempted to mitigate this issue, generative models have shown success among them recently. We propose a data augmentation framework that is aided by cross-modal information transfer and mutual information regularization. Mutual information based metric can serve as an indicator for the quality. Furthermore, we expand this data augmentation scope to multimodal inputs, thanks to mutual information ensureing dependency between modalities. Our framework was tested on three benchmark datasets: IEMOCAP, MSP-IMPROV and MSP-Podcast. The implementation was designed to generate input features that are fed into last layer for emotion classification. Our framework improved the performance of emotion prediction against existing works. Also, we discovered that our framework is able to generate new inputs without any cross-modal information.

CVOct 10, 2025
Polar Separable Transform for Efficient Orthogonal Rotation-Invariant Image Representation

Satya P. Singh, Rashmi Chaudhry, Anand Srivastava et al.

Orthogonal moment-based image representations are fundamental in computer vision, but classical methods suffer from high computational complexity and numerical instability at large orders. Zernike and pseudo-Zernike moments, for instance, require coupled radial-angular processing that precludes efficient factorization, resulting in $\mathcal{O}(n^3N^2)$ to $\mathcal{O}(n^6N^2)$ complexity and $\mathcal{O}(N^4)$ condition number scaling for the $n$th-order moments on an $N\times N$ image. We introduce \textbf{PSepT} (Polar Separable Transform), a separable orthogonal transform that overcomes the non-separability barrier in polar coordinates. PSepT achieves complete kernel factorization via tensor-product construction of Discrete Cosine Transform (DCT) radial bases and Fourier harmonic angular bases, enabling independent radial and angular processing. This separable design reduces computational complexity to $\mathcal{O}(N^2 \log N)$, memory requirements to $\mathcal{O}(N^2)$, and condition number scaling to $\mathcal{O}(\sqrt{N})$, representing exponential improvements over polynomial approaches. PSepT exhibits orthogonality, completeness, energy conservation, and rotation-covariance properties. Experimental results demonstrate better numerical stability, computational efficiency, and competitive classification performance on structured datasets, while preserving exact reconstruction. The separable framework enables high-order moment analysis previously infeasible with classical methods, opening new possibilities for robust image analysis applications.

CLNov 8, 2018
Marshall-Olkin Power-Law Distributions in Length-Frequency of Entities

Xiaoshi Zhong, Xiang Yu, Erik Cambria et al.

Entities involve important concepts with concrete meanings and play important roles in numerous linguistic tasks. Entities have different forms in different linguistic tasks and researchers treat those different forms as different concepts. In this paper, we are curious to know whether there are some common characteristics that connect those different forms of entities. Specifically, we investigate the underlying distributions of entities from different types and different languages, trying to figure out some common characteristics behind those diverse entities. After analyzing twelve datasets about different types of entities and eighteen datasets about entities in different languages, we find that while these entities are dramatically diverse from each other in many aspects, their length-frequencies can be well characterized by a family of Marshall-Olkin power-law (MOPL) distributions. We conduct experiments on those thirty datasets about entities in different types and different languages, and experimental results demonstrate that MOPL models characterize the length-frequencies of entities much better than two state-of-the-art power-law models and an alternative log-normal model. Experimental results also demonstrate that MOPL models are scalable to the length-frequency of entities in large-scale real-world datasets.

CLOct 16, 2018
Large Language Models for Few-Shot Named Entity Recognition

Yufei Zhao, Xiaoshi Zhong, Erik Cambria et al.

Named entity recognition (NER) is a fundamental task in numerous downstream applications. Recently, researchers have employed pre-trained language models (PLMs) and large language models (LLMs) to address this task. However, fully leveraging the capabilities of PLMs and LLMs with minimal human effort remains challenging. In this paper, we propose GPT4NER, a method that prompts LLMs to resolve the few-shot NER task. GPT4NER constructs effective prompts using three key components: entity definition, few-shot examples, and chain-of-thought. By prompting LLMs with these effective prompts, GPT4NER transforms few-shot NER, which is traditionally considered as a sequence-labeling problem, into a sequence-generation problem. We conduct experiments on two benchmark datasets, CoNLL2003 and OntoNotes5.0, and compare the performance of GPT4NER to representative state-of-the-art models in both few-shot and fully supervised settings. Experimental results demonstrate that GPT4NER achieves the $F_1$ of 83.15\% on CoNLL2003 and 70.37\% on OntoNotes5.0, significantly outperforming few-shot baselines by an average margin of 7 points. Compared to fully-supervised baselines, GPT4NER achieves 87.9\% of their best performance on CoNLL2003 and 76.4\% of their best performance on OntoNotes5.0. We also utilize a relaxed-match metric for evaluation and report performance in the sub-task of named entity extraction (NEE), and experiments demonstrate their usefulness to help better understand model behaviors in the NER task.

CVJul 15, 2018
Deep neural network ensemble by data augmentation and bagging for skin lesion classification

Manik Goyal, Jagath C. Rajapakse

This work summarizes our submission for the Task 3: Disease Classification of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection. We use a novel deep neural network (DNN) ensemble architecture introduced by us that can effectively classify skin lesions by using data-augmentation and bagging to address paucity of data and prevent over-fitting. The ensemble is composed of two DNN architectures: Inception-v4 and Inception-Resnet-v2. The DNN architectures are combined in to an ensemble by using a $1\times1$ convolution for fusion in a meta-learning layer.

CVJul 13, 2018
Automatic segmentation of skin lesions using deep learning

Joshua Peter Ebenezer, Jagath C. Rajapakse

This paper summarizes the method used in our submission to Task 1 of the International Skin Imaging Collaboration's (ISIC) Skin Lesion Analysis Towards Melanoma Detection challenge held in 2018. We used a fully automated method to accurately segment lesion boundaries from dermoscopic images. A U-net deep learning network is trained on publicly available data from ISIC. We introduce the use of intensity, color, and texture enhancement operations as pre-processing steps and morphological operations and contour identification as post-processing steps.