CVNov 13, 2022Code
Enhancing Few-shot Image Classification with Cosine TransformerQuang-Huy Nguyen, Cuong Q. Nguyen, Dung D. Le et al. · cmu, deepmind
This paper addresses the few-shot image classification problem, where the classification task is performed on unlabeled query samples given a small amount of labeled support samples only. One major challenge of the few-shot learning problem is the large variety of object visual appearances that prevents the support samples to represent that object comprehensively. This might result in a significant difference between support and query samples, therefore undermining the performance of few-shot algorithms. In this paper, we tackle the problem by proposing Few-shot Cosine Transformer (FS-CT), where the relational map between supports and queries is effectively obtained for the few-shot tasks. The FS-CT consists of two parts, a learnable prototypical embedding network to obtain categorical representations from support samples with hard cases, and a transformer encoder to effectively achieve the relational map from two different support and query samples. We introduce Cosine Attention, a more robust and stable attention module that enhances the transformer module significantly and therefore improves FS-CT performance from 5% to over 20% in accuracy compared to the default scaled dot-product mechanism. Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks across backbones and few-shot configurations. We also developed a custom few-shot dataset for Yoga pose recognition to demonstrate the potential of our algorithm for practical application. Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications, such as healthcare, medical, and security surveillance. The official implementation code of our Few-shot Cosine Transformer is available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer
CVNov 20, 2022
FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource Constrained Devices using Divide and Collaborative TrainingQuan Nguyen, Hieu H. Pham, Kok-Seng Wong et al. · cmu, deepmind
We introduce FedDCT, a novel distributed learning paradigm that enables the usage of large, high-performance CNNs on resource-limited edge devices. As opposed to traditional FL approaches, which require each client to train the full-size neural network independently during each training round, the proposed FedDCT allows a cluster of several clients to collaboratively train a large deep learning model by dividing it into an ensemble of several small sub-models and train them on multiple devices in parallel while maintaining privacy. In this collaborative training process, clients from the same cluster can also learn from each other, further improving their ensemble performance. In the aggregation stage, the server takes a weighted average of all the ensemble models trained by all the clusters. FedDCT reduces the memory requirements and allows low-end devices to participate in FL. We empirically conduct extensive experiments on standardized datasets, including CIFAR-10, CIFAR-100, and two real-world medical datasets HAM10000 and VAIPE. Experimental results show that FedDCT outperforms a set of current SOTA FL methods with interesting convergence behaviors. Furthermore, compared to other existing approaches, FedDCT achieves higher accuracy and substantially reduces the number of communication rounds (with $4-8$ times fewer memory requirements) to achieve the desired accuracy on the testing dataset without incurring any extra training cost on the server side.
LGMar 3, 2023
Backdoor Attacks and Defenses in Federated Learning: Survey, Challenges and Future Research DirectionsThuy Dung Nguyen, Tuan Nguyen, Phi Le Nguyen et al. · baidu
Federated learning (FL) is a machine learning (ML) approach that allows the use of distributed data without compromising personal privacy. However, the heterogeneous distribution of data among clients in FL can make it difficult for the orchestration server to validate the integrity of local model updates, making FL vulnerable to various threats, including backdoor attacks. Backdoor attacks involve the insertion of malicious functionality into a targeted model through poisoned updates from malicious clients. These attacks can cause the global model to misbehave on specific inputs while appearing normal in other cases. Backdoor attacks have received significant attention in the literature due to their potential to impact real-world deep learning applications. However, they have not been thoroughly studied in the context of FL. In this survey, we provide a comprehensive survey of current backdoor attack strategies and defenses in FL, including a comprehensive analysis of different approaches. We also discuss the challenges and potential future directions for attacks and defenses in the context of FL.
CVAug 15, 2022Code
Enhancing Deep Learning-based 3-lead ECG Classification with Heartbeat Counting and Demographic Data IntegrationKhiem H. Le, Hieu H. Pham, Thao B. T. Nguyen et al.
Nowadays, an increasing number of people are being diagnosed with cardiovascular diseases (CVDs), the leading cause of death globally. The gold standard for identifying these heart problems is via electrocardiogram (ECG). The standard 12-lead ECG is widely used in clinical practice and the majority of current research. However, using a lower number of leads can make ECG more pervasive as it can be integrated with portable or wearable devices. This article introduces two novel techniques to improve the performance of the current deep learning system for 3-lead ECG classification, making it comparable with models that are trained using standard 12-lead ECG. Specifically, we propose a multi-task learning scheme in the form of the number of heartbeats regression and an effective mechanism to integrate patient demographic data into the system. With these two advancements, we got classification performance in terms of F1 scores of 0.9796 and 0.8140 on two large-scale ECG datasets, i.e., Chapman and CPSC-2018, respectively, which surpassed current state-of-the-art ECG classification methods, even those trained on 12-lead data. To encourage further development, our source code is publicly available at https://github.com/lhkhiem28/LightX3ECG.
CVAug 7, 2022
Video-based Human Action Recognition using Deep Learning: A ReviewHieu H. Pham, Louahdi Khoudour, Alain Crouzil et al.
Human action recognition is an important application domain in computer vision. Its primary aim is to accurately describe human actions and their interactions from a previously unseen data sequence acquired by sensors. The ability to recognize, understand, and predict complex human actions enables the construction of many important applications such as intelligent surveillance systems, human-computer interfaces, health care, security, and military applications. In recent years, deep learning has been given particular attention by the computer vision community. This paper presents an overview of the current state-of-the-art in action recognition using video analysis with deep learning techniques. We present the most important deep learning models for recognizing human actions, and analyze them to provide the current progress of deep learning algorithms applied to solve human action recognition problems in realistic videos highlighting their advantages and disadvantages. Based on the quantitative analysis using recognition accuracies reported in the literature, our study identifies state-of-the-art deep architectures in action recognition and then provides current trends and open problems for future works in this field.
CVOct 5, 2022Code
Multi-stream Fusion for Class Incremental Learning in Pill Image ClassificationTrong-Tung Nguyen, Hieu H. Pham, Phi Le Nguyen et al.
Classifying pill categories from real-world images is crucial for various smart healthcare applications. Although existing approaches in image classification might achieve a good performance on fixed pill categories, they fail to handle novel instances of pill categories that are frequently presented to the learning algorithm. To this end, a trivial solution is to train the model with novel classes. However, this may result in a phenomenon known as catastrophic forgetting, in which the system forgets what it learned in previous classes. In this paper, we address this challenge by introducing the class incremental learning (CIL) ability to traditional pill image classification systems. Specifically, we propose a novel incremental multi-stream intermediate fusion framework enabling incorporation of an additional guidance information stream that best matches the domain of the problem into various state-of-the-art CIL methods. From this framework, we consider color-specific information of pill images as a guidance stream and devise an approach, namely "Color Guidance with Multi-stream intermediate fusion"(CG-IMIF) for solving CIL pill image classification task. We conduct comprehensive experiments on real-world incremental pill image classification dataset, namely VAIPE-PCIL, and find that the CG-IMIF consistently outperforms several state-of-the-art methods by a large margin in different task settings. Our code, data, and trained model are available at https://github.com/vinuni-vishc/CG-IMIF.
LGNov 25, 2023Code
MPCNN: A Novel Matrix Profile Approach for CNN-based Sleep Apnea ClassificationHieu X. Nguyen, Duong V. Nguyen, Hieu H. Pham et al.
Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed within the complete PQRST segments. In this study, we propose an innovative approach to address this diagnostic gap by delving deeper into the comprehensive segments of the ECG signal. The proposed methodology draws inspiration from Matrix Profile algorithms, which generate an Euclidean distance profile from fixed-length signal subsequences. From this, we derived the Min Distance Profile (MinDP), Max Distance Profile (MaxDP), and Mean Distance Profile (MeanDP) based on the minimum, maximum, and mean of the profile distances, respectively. To validate the effectiveness of our approach, we use the modified LeNet-5 architecture as the primary CNN model, along with two existing lightweight models, BAFNet and SE-MSCNN, for ECG classification tasks. Our extensive experimental results on the PhysioNet Apnea-ECG dataset revealed that with the new feature extraction method, we achieved a per-segment accuracy up to 92.11 \% and a per-recording accuracy of 100\%. Moreover, it yielded the highest correlation compared to state-of-the-art methods, with a correlation coefficient of 0.989. By introducing a new feature extraction method based on distance relationships, we enhanced the performance of certain lightweight models, showing potential for home sleep apnea test (HSAT) and SA detection in IoT devices. The source code for this work is made publicly available in GitHub: https://github.com/vinuni-vishc/MPCNN-Sleep-Apnea.
CVNov 1, 2025Code
VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task LearningDang H. Nguyen, Hieu H. Pham, Hao T. Nguyen et al.
We present VinDr-CXR-VQA, a large-scale chest X-ray dataset for explainable Medical Visual Question Answering (Med-VQA) with spatial grounding. The dataset contains 17,597 question-answer pairs across 4,394 images, each annotated with radiologist-verified bounding boxes and clinical reasoning explanations. Our question taxonomy spans six diagnostic types-Where, What, Is there, How many, Which, and Yes/No-capturing diverse clinical intents. To improve reliability, we construct a balanced distribution of 41.7% positive and 58.3% negative samples, mitigating hallucinations in normal cases. Benchmarking with MedGemma-4B-it demonstrates improved performance (F1 = 0.624, +11.8% over baseline) while enabling lesion localization. VinDr-CXR-VQA aims to advance reproducible and clinically grounded Med-VQA research. The dataset and evaluation tools are publicly available at huggingface.co/datasets/Dangindev/VinDR-CXR-VQA.
IVMar 20, 2022
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammographyHieu T. Nguyen, Ha Q. Nguyen, Hieu H. Pham et al.
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sample size or digitalized from screen-film mammography (SFM), hindering the development of CADe and CADx tools which are developed based on full-field digital mammography (FFDM). To overcome this challenge, we introduce VinDr-Mammo - a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. It is created for the assessment of Breast Imaging Reporting and Data System (BI-RADS) and density at the breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available on PhysioNet as a new imaging resource to promote advances in developing CADe and CADx tools for breast cancer screening.
CVJul 25, 2022
LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-lead Electrocardiogram ClassificationKhiem H. Le, Hieu H. Pham, Thao BT. Nguyen et al.
Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices and most of the current research, standard 12-lead ECG is mainly used. However, using a lower number of leads can make ECG more prevalent as it can be conveniently recorded by portable or wearable devices. In this research, we develop a novel deep learning system to accurately identify multiple cardiovascular abnormalities by using only three ECG leads.
IVMar 20, 2022
PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in childrenHieu H. Pham, Ngoc H. Nguyen, Thanh T. Tran et al.
The development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist with more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the PediCXR data sample and make the dataset publicly available on https://physionet.org/content/pedicxr/1.0.0/
CVMar 20, 2022
Learning from Multiple Expert Annotators for Enhancing Anomaly Detection in Medical Image AnalysisKhiem H. Le, Tuan V. Tran, Hieu H. Pham et al.
Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about "ground truth labels" during the annotation process, depending on their expertise and experience. As a result, the labeled data may contain a variety of human biases with a high rate of disagreement among annotators, which significantly affect the performance of supervised machine learning algorithms. To tackle this challenge, we propose a simple yet effective approach to combine annotations from multiple radiology experts for training a deep learning-based detector that aims to detect abnormalities on medical scans. The proposed method first estimates the ground truth annotations and confidence scores of training examples. The estimated annotations and their scores are then used to train a deep learning detector with a re-weighted loss function to localize abnormal findings. We conduct an extensive experimental evaluation of the proposed approach on both simulated and real-world medical imaging datasets. The experimental results show that our approach significantly outperforms baseline approaches that do not consider the disagreements among annotators, including methods in which all of the noisy annotations are treated equally as ground truth and the ensemble of different models trained on different label sets provided separately by annotators.
IVAug 10, 2022
Detecting COVID-19 from digitized ECG printouts using 1D convolutional neural networksThao Nguyen, Hieu H. Pham, Huy Khiem Le et al.
The COVID-19 pandemic has exposed the vulnerability of healthcare services worldwide, raising the need to develop novel tools to provide rapid and cost-effective screening and diagnosis. Clinical reports indicated that COVID-19 infection may cause cardiac injury, and electrocardiograms (ECG) may serve as a diagnostic biomarker for COVID-19. This study aims to utilize ECG signals to detect COVID-19 automatically. We propose a novel method to extract ECG signals from ECG paper records, which are then fed into a one-dimensional convolution neural network (1D-CNN) to learn and diagnose the disease. To evaluate the quality of digitized signals, R peaks in the paper-based ECG images are labeled. Afterward, RR intervals calculated from each image are compared to RR intervals of the corresponding digitized signal. Experiments on the COVID-19 ECG images dataset demonstrate that the proposed digitization method is able to capture correctly the original signals, with a mean absolute error of 28.11 ms. Our proposed 1D-CNN model, which is trained on the digitized ECG signals, allows identifying individuals with COVID-19 and other subjects accurately, with classification accuracies of 98.42%, 95.63%, and 98.50% for classifying COVID-19 vs. Normal, COVID-19 vs. Abnormal Heartbeats, and COVID-19 vs. other classes, respectively. Furthermore, the proposed method also achieves a high-level of performance for the multi-classification task. Our findings indicate that a deep learning system trained on digitized ECG signals can serve as a potential tool for diagnosing COVID-19.
IVMar 20, 2022
Phase Recognition in Contrast-Enhanced CT Scans based on Deep Learning and Random SamplingBinh T. Dao, Thang V. Nguyen, Hieu H. Pham et al.
A fully automated system for interpreting abdominal computed tomography (CT) scans with multiple phases of contrast enhancement requires an accurate classification of the phases. This work aims at developing and validating a precise, fast multi-phase classifier to recognize three main types of contrast phases in abdominal CT scans. We propose in this study a novel method that uses a random sampling mechanism on top of deep CNNs for the phase recognition of abdominal CT scans of four different phases: non-contrast, arterial, venous, and others. The CNNs work as a slice-wise phase prediction, while the random sampling selects input slices for the CNN models. Afterward, majority voting synthesizes the slice-wise results of the CNNs, to provide the final prediction at scan level. Our classifier was trained on 271,426 slices from 830 phase-annotated CT scans, and when combined with majority voting on 30% of slices randomly chosen from each scan, achieved a mean F1-score of 92.09% on our internal test set of 358 scans. The proposed method was also evaluated on 2 external test sets: CTPAC-CCRCC (N = 242) and LiTS (N = 131), which were annotated by our experts. Although a drop in performance has been observed, the model performance remained at a high level of accuracy with a mean F1-score of 76.79% and 86.94% on CTPAC-CCRCC and LiTS datasets, respectively. Our experimental results also showed that the proposed method significantly outperformed the state-of-the-art 3D approaches while requiring less computation time for inference.
IVAug 6, 2022
An Accurate and Explainable Deep Learning System Improves Interobserver Agreement in the Interpretation of Chest RadiographHieu H. Pham, Ha Q. Nguyen, Hieu T. Nguyen et al.
Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classify a CXR scan into multiple thoracic diseases and, at the same time, localize most types of critical findings on the image. VinDr-CXR was trained on 51,485 CXR scans with radiologist-provided bounding box annotations. It demonstrated a comparable performance to experienced radiologists in classifying 6 common thoracic diseases on a retrospective validation set of 3,000 CXR scans, with a mean area under the receiver operating characteristic curve (AUROC) of 0.967 (95% confidence interval [CI]: 0.958-0.975). The VinDr-CXR was also externally validated in independent patient cohorts and showed its robustness. For the localization task with 14 types of lesions, our free-response receiver operating characteristic (FROC) analysis showed that the VinDr-CXR achieved a sensitivity of 80.2% at the rate of 1.0 false-positive lesion identified per scan. A prospective study was also conducted to measure the clinical impact of the VinDr-CXR in assisting six experienced radiologists. The results indicated that the proposed system, when used as a diagnosis supporting tool, significantly improved the agreement between radiologists themselves with an increase of 1.5% in mean Fleiss' Kappa. We also observed that, after the radiologists consulted VinDr-CXR's suggestions, the agreement between each of them and the system was remarkably increased by 3.3% in mean Cohen's Kappa.
IVSep 11, 2022
Learning to diagnose common thorax diseases on chest radiographs from radiology reports in VietnameseThao T. B. Nguyen, Tam M. Vo, Thang V. Nguyen et al.
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with various model architectures to select the one that delivers the best performance. Our best model (CheXpert-pretrained EfficientNet-B2) yields an F1-score of 0.6989 (95% CI 0.6740, 0.7240), AUC of 0.7912, sensitivity of 0.7064 and specificity of 0.8760 for the abnormal diagnosis in general. Finally, we demonstrate that our coarse classification (based on five specific locations of abnormalities) yields comparable results to fine classification (twelve pathologies) on the benchmark CheXpert dataset for general anomaly detection while delivering better performance in terms of the average performance of all classes.
CVMar 20, 2022
A Novel Transparency Strategy-based Data Augmentation Approach for BI-RADS Classification of MammogramsSam B. Tran, Huyen T. X. Nguyen, Chi Phan et al.
Image augmentation techniques have been widely investigated to improve the performance of deep learning (DL) algorithms on mammography classification tasks. Recent methods have proved the efficiency of image augmentation on data deficiency or data imbalance issues. In this paper, we propose a novel transparency strategy to boost the Breast Imaging Reporting and Data System (BI-RADS) scores of mammogram classifiers. The proposed approach utilizes the Region of Interest (ROI) information to generate more high-risk training examples for breast cancer (BI-RADS 3, 4, 5) from original images. Our extensive experiments on three different datasets show that the proposed approach significantly improves the mammogram classification performance and surpasses a state-of-the-art data augmentation technique called CutMix. This study also highlights that our transparency method is more effective than other augmentation strategies for BI-RADS classification and can be widely applied to other computer vision tasks.
CVNov 9, 2023
TransReg: Cross-transformer as auto-registration module for multi-view mammogram mass detectionHoang C. Nguyen, Chi Phan, Hieu H. Pham
Screening mammography is the most widely used method for early breast cancer detection, significantly reducing mortality rates. The integration of information from multi-view mammograms enhances radiologists' confidence and diminishes false-positive rates since they can examine on dual-view of the same breast to cross-reference the existence and location of the lesion. Inspired by this, we present TransReg, a Computer-Aided Detection (CAD) system designed to exploit the relationship between craniocaudal (CC), and mediolateral oblique (MLO) views. The system includes cross-transformer to model the relationship between the region of interest (RoIs) extracted by siamese Faster RCNN network for mass detection problems. Our work is the first time cross-transformer has been integrated into an object detection framework to model the relation between ipsilateral views. Our experimental evaluation on DDSM and VinDr-Mammo datasets shows that our TransReg, equipped with SwinT as a feature extractor achieves state-of-the-art performance. Specifically, at the false positive rate per image at 0.5, TransReg using SwinT gets a recall at 83.3% for DDSM dataset and 79.7% for VinDr-Mammo dataset. Furthermore, we conduct a comprehensive analysis to demonstrate that cross-transformer can function as an auto-registration module, aligning the masses in dual-view and utilizing this information to inform final predictions. It is a replication diagnostic workflow of expert radiologists
CVAug 5, 2022
Slice-level Detection of Intracranial Hemorrhage on CT Using Deep Descriptors of Adjacent SlicesDat T. Ngo, Thao T. B. Nguyen, Hieu T. Nguyen et al.
The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imaging community to assist clinicians and medical experts in disease diagnosis and treatment. However, training and inferencing deep neural networks such as D-CNN on high-resolution 3D volumes of Computed Tomography (CT) scans for diagnostic tasks pose formidable computational challenges. This challenge raises the need of developing deep learning-based approaches that are robust in learning representations in 2D images, instead 3D scans. In this work, we propose for the first time a new strategy to train \emph{slice-level} classifiers on CT scans based on the descriptors of the adjacent slices along the axis. In particular, each of which is extracted through a convolutional neural network (CNN). This method is applicable to CT datasets with per-slice labels such as the RSNA Intracranial Hemorrhage (ICH) dataset, which aims to predict the presence of ICH and classify it into 5 different sub-types. We obtain a single model in the top 4% best-performing solutions of the RSNA ICH challenge, where model ensembles are allowed. Experiments also show that the proposed method significantly outperforms the baseline model on CQ500. The proposed method is general and can be applied to other 3D medical diagnosis tasks such as MRI imaging. To encourage new advances in the field, we will make our codes and pre-trained model available upon acceptance of the paper.
CVMar 29, 2023
Improving Object Detection in Medical Image Analysis through Multiple Expert Annotators: An Empirical InvestigationHieu H. Pham, Khiem H. Le, Tuan V. Tran et al.
The work discusses the use of machine learning algorithms for anomaly detection in medical image analysis and how the performance of these algorithms depends on the number of annotators and the quality of labels. To address the issue of subjectivity in labeling with a single annotator, we introduce a simple and effective approach that aggregates annotations from multiple annotators with varying levels of expertise. We then aim to improve the efficiency of predictive models in abnormal detection tasks by estimating hidden labels from multiple annotations and using a re-weighted loss function to improve detection performance. Our method is evaluated on a real-world medical imaging dataset and outperforms relevant baselines that do not consider disagreements among annotators.
IVApr 1, 2023
Evaluating the impact of an explainable machine learning system on the interobserver agreement in chest radiograph interpretationHieu H. Pham, Ha Q. Nguyen, Hieu T. Nguyen et al.
We conducted a prospective study to measure the clinical impact of an explainable machine learning system on interobserver agreement in chest radiograph interpretation. The AI system, which we call as it VinDr-CXR when used as a diagnosis-supporting tool, significantly improved the agreement between six radiologists with an increase of 1.5% in mean Fleiss' Kappa. In addition, we also observed that, after the radiologists consulted AI's suggestions, the agreement between each radiologist and the system was remarkably increased by 3.3% in mean Cohen's Kappa. This work has been accepted for publication in IEEE Access and this paper is our short version submitted to the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, USA.
LGJan 8Code
FedKDX: Federated Learning with Negative Knowledge Distillation for Enhanced Healthcare AI SystemsQuang-Tu Pham, Hoang-Dieu Vu, Dinh-Dat Pham et al.
This paper introduces FedKDX, a federated learning framework that addresses limitations in healthcare AI through Negative Knowledge Distillation (NKD). Unlike existing approaches that focus solely on positive knowledge transfer, FedKDX captures both target and non-target information to improve model generalization in healthcare applications. The framework integrates multiple knowledge transfer techniques--including traditional knowledge distillation, contrastive learning, and NKD--within a unified architecture that maintains privacy while reducing communication costs. Through experiments on healthcare datasets (SLEEP, UCI-HAR, and PAMAP2), FedKDX demonstrates improved accuracy (up to 2.53% over state-of-the-art methods), faster convergence, and better performance on non-IID data distributions. Theoretical analysis supports NKD's contribution to addressing statistical heterogeneity in distributed healthcare data. The approach shows promise for privacy-sensitive medical applications under regulatory frameworks like HIPAA and GDPR, offering a balanced solution between performance and practical implementation requirements in decentralized healthcare settings. The code and model are available at https://github.com/phamdinhdat-ai/Fed_2024.
LGJun 27, 2025Code
Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor DataHoang-Dieu Vu, Duc-Nghia Tran, Quang-Tu Pham et al.
This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection using wearable sensor data. The proposed approach utilizes a unified CNN-based architecture, MTL-net, which processes accelerometer data and branches into two outputs for each respective task. Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher, significantly reducing training computational overhead while maintaining performance benefits. To support this research, we developed a comprehensive accelerometer-based dataset capturing 12 distinct sleep postures across three different wearing positions, complementing two existing public datasets (MHealth and WISDM). Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios, achieving notable improvements in both human activity recognition and device placement detection tasks. This method demonstrates enhanced stability in convergence patterns during training and exhibits reduced overfitting compared to traditional multitask learning baselines. This framework contributes to the practical implementation of knowledge distillation in human activity recognition systems, offering an effective solution for multitask learning with accelerometer data that balances accuracy and training efficiency. More broadly, it reduces the computational cost of model training, which is critical for scenarios requiring frequent model updates or training on resource-constrained platforms. The code and model are available at https://github.com/Kuan2vn/smooth\_distill.
CVFeb 22, 2025
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional InteractionsTue M. Cao, Nhat X. Hoang, Hieu H. Pham et al.
Understanding the inner workings of neural networks is essential for enhancing model performance and interpretability. Current research predominantly focuses on examining the connection between individual neurons and the model's final predictions. Which suffers from challenges in interpreting the internal workings of the model, particularly when neurons encode multiple unrelated features. In this paper, we propose a novel framework that transitions the focus from analyzing individual neurons to investigating groups of neurons, shifting the emphasis from neuron-output relationships to functional interaction between neurons. Our automated framework, NeurFlow, first identifies core neurons and clusters them into groups based on shared functional relationships, enabling a more coherent and interpretable view of the network's internal processes. This approach facilitates the construction of a hierarchical circuit representing neuron interactions across layers, thus improving interpretability while reducing computational costs. Our extensive empirical studies validate the fidelity of our proposed NeurFlow. Additionally, we showcase its utility in practical applications such as image debugging and automatic concept labeling, thereby highlighting its potential to advance the field of neural network explainability.
CRNov 5, 2024
FedBlock: A Blockchain Approach to Federated Learning against Backdoor AttacksDuong H. Nguyen, Phi L. Nguyen, Truong T. Nguyen et al.
Federated Learning (FL) is a machine learning method for training with private data locally stored in distributed machines without gathering them into one place for central learning. Despite its promises, FL is prone to critical security risks. First, because FL depends on a central server to aggregate local training models, this is a single point of failure. The server might function maliciously. Second, due to its distributed nature, FL might encounter backdoor attacks by participating clients. They can poison the local model before submitting to the server. Either type of attack, on the server or the client side, would severely degrade learning accuracy. We propose FedBlock, a novel blockchain-based FL framework that addresses both of these security risks. FedBlock is uniquely desirable in that it involves only smart contract programming, thus deployable atop any blockchain network. Our framework is substantiated with a comprehensive evaluation study using real-world datasets. Its robustness against backdoor attacks is competitive with the literature of FL backdoor defense. The latter, however, does not address the server risk as we do.
CVFeb 21, 2024
MSTAR: Multi-Scale Backbone Architecture Search for Timeseries ClassificationTue M. Cao, Nhat H. Tran, Hieu H. Pham et al.
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.
CVSep 7, 2025
ConstStyle: Robust Domain Generalization with Unified Style TransformationNam Duong Tran, Nam Nguyen Phuong, Hieu H. Pham et al.
Deep neural networks often suffer performance drops when test data distribution differs from training data. Domain Generalization (DG) aims to address this by focusing on domain-invariant features or augmenting data for greater diversity. However, these methods often struggle with limited training domains or significant gaps between seen (training) and unseen (test) domains. To enhance DG robustness, we hypothesize that it is essential for the model to be trained on data from domains that closely resemble unseen test domains-an inherently difficult task due to the absence of prior knowledge about the unseen domains. Accordingly, we propose ConstStyle, a novel approach that leverages a unified domain to capture domain-invariant features and bridge the domain gap with theoretical analysis. During training, all samples are mapped onto this unified domain, optimized for seen domains. During testing, unseen domain samples are projected similarly before predictions. By aligning both training and testing data within this unified domain, ConstStyle effectively reduces the impact of domain shifts, even with large domain gaps or few seen domains. Extensive experiments demonstrate that ConstStyle consistently outperforms existing methods across diverse scenarios. Notably, when only a limited number of seen domains are available, ConstStyle can boost accuracy up to 19.82\% compared to the next best approach.
IVDec 8, 2021
A novel multi-view deep learning approach for BI-RADS and density assessment of mammogramsHuyen T. X. Nguyen, Sam B. Tran, Dung B. Nguyen et al.
Advanced deep learning (DL) algorithms may predict the patient's risk of developing breast cancer based on the Breast Imaging Reporting and Data System (BI-RADS) and density standards. Recent studies have suggested that the combination of multi-view analysis improved the overall breast exam classification. In this paper, we propose a novel multi-view DL approach for BI-RADS and density assessment of mammograms. The proposed approach first deploys deep convolutional networks for feature extraction on each view separately. The extracted features are then stacked and fed into a Light Gradient Boosting Machine (LightGBM) classifier to predict BI-RADS and density scores. We conduct extensive experiments on both the internal mammography dataset and the public dataset Digital Database for Screening Mammography (DDSM). The experimental results demonstrate that the proposed approach outperforms the single-view classification approach on two benchmark datasets by huge F1-score margins (+5% on the internal dataset and +10% on the DDSM dataset). These results highlight the vital role of combining multi-view information to improve the performance of breast cancer risk prediction.
IVAug 14, 2021
DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray ScansHieu H. Pham, Dung V. Do, Ha Q. Nguyen
X-ray imaging in DICOM format is the most commonly used imaging modality in clinical practice, resulting in vast, non-normalized databases. This leads to an obstacle in deploying AI solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Unfortunately, to the best of our knowledge, there is no open tool or framework for this task to date. To fill this lack, we introduce a DICOM Imaging Router that deploys deep CNNs for categorizing unknown DICOM X-ray images into five anatomical groups: abdominal, adult chest, pediatric chest, spine, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts. Specifically, our best performing model achieved a recall of 0.982 (95% CI, 0.977-0.988), a precision of 0.985 (95% CI, 0.975-0.989) and a F1-score of 0.981 (95% CI, 0.976-0.987), whilst requiring less computation for inference (0.0295 second per image). Our external validity on 1,000 X-ray images shows the robustness of the proposed approach across hospitals. These remarkable performances indicate that deep CNNs can accurately and effectively differentiate human body parts from X-ray scans, thereby providing potential benefits for a wide range of applications in clinical settings. The dataset, codes, and trained deep learning models from this study will be made publicly available on our project website at https://vindr.ai/.
IVAug 14, 2021
Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural NetworksThanh T. Tran, Hieu H. Pham, Thang V. Nguyen et al.
Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs.
IVJul 3, 2021
VinDr-RibCXR: A Benchmark Dataset for Automatic Segmentation and Labeling of Individual Ribs on Chest X-raysHoang C. Nguyen, Tung T. Le, Hieu H. Pham et al.
We introduce a new benchmark dataset, namely VinDr-RibCXR, for automatic segmentation and labeling of individual ribs from chest X-ray (CXR) scans. The VinDr-RibCXR contains 245 CXRs with corresponding ground truth annotations provided by human experts. A set of state-of-the-art segmentation models are trained on 196 images from the VinDr-RibCXR to segment and label 20 individual ribs. Our best performing model obtains a Dice score of 0.834 (95% CI, 0.810--0.853) on an independent test set of 49 images. Our study, therefore, serves as a proof of concept and baseline performance for future research.
IVJun 24, 2021
VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographsHieu T. Nguyen, Hieu H. Pham, Nghia T. Nguyen et al.
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising 10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around abnormal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61% (95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision (mAP@0.5) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.
CVMay 25, 2020
Interpreting Chest X-rays via CNNs that Exploit Hierarchical Disease Dependencies and Uncertainty LabelsHieu H. Pham, Tung T. Le, Dat T. Ngo et al.
The chest X-rays (CXRs) is one of the views most commonly ordered by radiologists (NHS),which is critical for diagnosis of many different thoracic diseases. Accurately detecting thepresence of multiple diseases from CXRs is still a challenging task. We present a multi-labelclassification framework based on deep convolutional neural networks (CNNs) for diagnos-ing the presence of 14 common thoracic diseases and observations. Specifically, we trained astrong set of CNNs that exploit dependencies among abnormality labels and used the labelsmoothing regularization (LSR) for a better handling of uncertain samples. Our deep net-works were trained on over 200,000 CXRs of the recently released CheXpert dataset (Irvinandal., 2019) and the final model, which was an ensemble of the best performing networks,achieved a mean area under the curve (AUC) of 0.940 in predicting 5 selected pathologiesfrom the validation set. To the best of our knowledge, this is the highest AUC score yetreported to date. More importantly, the proposed method was also evaluated on an inde-pendent test set of the CheXpert competition, containing 500 CXR studies annotated by apanel of 5 experienced radiologists. The reported performance was on average better than2.6 out of 3 other individual radiologists with a mean AUC of 0.930, which had led to thecurrent state-of-the-art performance on the CheXpert test set.
IVNov 15, 2019
Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labelsHieu H. Pham, Tung T. Le, Dat Q. Tran et al.
Chest radiography is one of the most common types of diagnostic radiology exams, which is critical for screening and diagnosis of many different thoracic diseases. Specialized algorithms have been developed to detect several specific pathologies such as lung nodule or lung cancer. However, accurately detecting the presence of multiple diseases from chest X-rays (CXRs) is still a challenging task. This paper presents a supervised multi-label classification framework based on deep convolutional neural networks (CNNs) for predicting the risk of 14 common thoracic diseases. We tackle this problem by training state-of-the-art CNNs that exploit dependencies among abnormality labels. We also propose to use the label smoothing technique for a better handling of uncertain samples, which occupy a significant portion of almost every CXR dataset. Our model is trained on over 200,000 CXRs of the recently released CheXpert dataset and achieves a mean area under the curve (AUC) of 0.940 in predicting 5 selected pathologies from the validation set. This is the highest AUC score yet reported to date. The proposed method is also evaluated on the independent test set of the CheXpert competition, which is composed of 500 CXR studies annotated by a panel of 5 experienced radiologists. The performance is on average better than 2.6 out of 3 other individual radiologists with a mean AUC of 0.930, which ranks first on the CheXpert leaderboard at the time of writing this paper.