IVJul 2, 2023Code
ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation PerformanceJiayu Huo, Yang Liu, Xi Ouyang et al.
Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and background areas and boundary artifacts, which weakens the effectiveness of such strategies. In this paper, we propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic. In particular, we propose an Adaptive Region Harmonization (ARH) module to dynamically align foreground feature maps to the background with an attention mechanism. We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images. Experimental results on the ATLAS 2.0 dataset show that ARHNet outperforms other methods for image harmonization tasks, and boosts the down-stream segmentation performance. Our code is publicly available at https://github.com/King-HAW/ARHNet.
IVApr 6, 2022
Follow My Eye: Using Gaze to Supervise Computer-Aided DiagnosisSheng Wang, Xi Ouyang, Tianming Liu et al.
When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of annotations from experienced experts can be time-consuming and expensive. In this paper, we demonstrate that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system. Particularly, we record the tracks of the radiologists' gaze when they are reading images. The gaze information is processed and then used to supervise the DNN's attention via an Attention Consistency module. To the best of our knowledge, the above pipeline is among the earliest efforts to leverage expert eye movement for deep-learning-based CAD. We have conducted extensive experiments on knee X-ray images for osteoarthritis assessment. The results show that our method can achieve considerable improvement in diagnosis performance, with the help of gaze supervision.
CVFeb 14, 2023
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language ModelsSheng Wang, Zihao Zhao, Xi Ouyang et al.
Large language models (LLMs) have recently demonstrated their potential in clinical applications, providing valuable medical knowledge and advice. For example, a large dialog LLM like ChatGPT has successfully passed part of the US medical licensing exam. However, LLMs currently have difficulty processing images, making it challenging to interpret information from medical images, which are rich in information that supports clinical decisions. On the other hand, computer-aided diagnosis (CAD) networks for medical images have seen significant success in the medical field by using advanced deep-learning algorithms to support clinical decision-making. This paper presents a method for integrating LLMs into medical-image CAD networks. The proposed framework uses LLMs to enhance the output of multiple CAD networks, such as diagnosis networks, lesion segmentation networks, and report generation networks, by summarizing and reorganizing the information presented in natural language text format. The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models to create a more user-friendly and understandable system for patients compared to conventional CAD systems. In the future, LLM's medical knowledge can be also used to improve the performance of vision-based medical-image CAD models.
IVJul 19, 2022
Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and DetectionZhenrong Shen, Xi Ouyang, Bin Xiao et al.
Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung nodule synthesis methods are proposed for the sake of data augmentation. Nevertheless, previous methods lack the ability to generate nodules that are realistic with the size attribute desired by the detector. To address this issue, we introduce a novel lung nodule synthesis framework in this paper, which decomposes nodule attributes into three main aspects including shape, size, and texture, respectively. A GAN-based Shape Generator firstly models nodule shapes by generating diverse shape masks. The following Size Modulation then enables quantitative control on the diameters of the generated nodule shapes in pixel-level granularity. A coarse-to-fine gated convolutional Texture Generator finally synthesizes visually plausible nodule textures conditioned on the modulated shape masks. Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task. Our experiments demonstrate the enhanced image quality, diversity, and controllability of the proposed lung nodule synthesis framework. We also validate the effectiveness of our data augmentation on greatly improving nodule detection performance.
CVApr 20, 2023
Domain Generalization for Mammographic Image Analysis with Contrastive LearningZheren Li, Zhiming Cui, Lichi Zhang et al.
The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods.
IVMar 27, 2024Code
Generative Medical SegmentationJiayu Huo, Xi Ouyang, Sébastien Ourselin et al.
Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models follow the discriminative pixel-wise classification learning paradigm and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medical Segmentation (GMS), a novel approach leveraging a generative model to perform image segmentation. Concretely, GMS employs a robust pre-trained vision foundation model to extract latent representations for images and corresponding ground truth masks, followed by a model that learns a mapping function from the image to the mask in the latent space. Once trained, the model generates an estimated segmentation mask using the pre-trained vision foundation model to decode the predicted latent representation back into the image space. The design of GMS leads to fewer trainable parameters in the model which reduces the risk of overfitting and enhances its generalization capability. Our experimental analysis across five public datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative and generative segmentation models. Furthermore, GMS is able to generalize well across datasets from different centers within the same imaging modality. Our experiments suggest GMS offers a scalable and effective solution for medical image segmentation. GMS implementation and trained model weights are available at https://github.com/King-HAW/GMS.
CVDec 23, 2021Code
Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and DiagnosisXi Ouyang, Srikrishna Karanam, Ziyan Wu et al.
We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a general lack of algorithm decision reasoning and interpretability. One potential way to address this problem is to further train these models to localize abnormalities in addition to just classifying them. However, doing this accurately will require a large amount of disease localization annotations by clinical experts, a task that is prohibitively expensive to accomplish for most applications. In this work, we take a step towards addressing these issues by means of a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues. On two large-scale chest X-ray datasets (NIH ChestX-ray14 and CheXpert), we demonstrate significant localization performance improvements over the current state of the art while also achieving competitive classification performance. Our code is available on https://github.com/oyxhust/HAM.
CVMay 15, 2023
Learning Better Contrastive View from Radiologist's GazeSheng Wang, Zixu Zhuang, Xi Ouyang et al.
Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims to minimizing distances between positive pairs. These methods usually apply random data augmentation to input images, expecting the augmented views of the same images to be similar and positively paired. However, random augmentation may overlook image semantic information and degrade the quality of augmented views in contrastive learning. This issue becomes more challenging in medical images since the abnormalities related to diseases can be tiny, and are easy to be corrupted (e.g., being cropped out) in the current scheme of random augmentation. In this work, we first demonstrate that, for widely-used X-ray images, the conventional augmentation prevalent in contrastive pre-training can affect the performance of the downstream diagnosis or classification tasks. Then, we propose a novel augmentation method, i.e., FocusContrast, to learn from radiologists' gaze in diagnosis and generate contrastive views for medical images with guidance from radiologists' visual attention. Specifically, we track the gaze movement of radiologists and model their visual attention when reading to diagnose X-ray images. The learned model can predict visual attention of the radiologists given a new input image, and further guide the attention-aware augmentation that hardly neglects the disease-related abnormalities. As a plug-and-play and framework-agnostic module, FocusContrast consistently improves state-of-the-art contrastive learning methods of SimCLR, MoCo, and BYOL by 4.0~7.0% in classification accuracy on a knee X-ray dataset.
IVJan 12, 2022
Knee Cartilage Defect Assessment by Graph Representation and Surface ConvolutionZixu Zhuang, Liping Si, Sheng Wang et al.
Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect assessment by applying convolutional neural networks (CNNs) to knee MRI. However, the physiologic characteristics of the cartilage may hinder such efforts: the cartilage is a thin curved layer, implying that only a small portion of voxels in knee MRI can contribute to the cartilage defect assessment; heterogeneous scanning protocols further challenge the feasibility of the CNNs in clinical practice; the CNN-based knee cartilage evaluation results lack interpretability. To address these challenges, we model the cartilages structure and appearance from knee MRI into a graph representation, which is capable of handling highly diverse clinical data. Then, guided by the cartilage graph representation, we design a non-Euclidean deep learning network with the self-attention mechanism, to extract cartilage features in the local and global, and to derive the final assessment with a visualized result. Our comprehensive experiments show that the proposed method yields superior performance in knee cartilage defect assessment, plus its convenient 3D visualization for interpretability.
IVNov 21, 2021
Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive LearningZheren Li, Zhiming Cui, Sheng Wang et al.
Lesion detection is a fundamental problem in the computer-aided diagnosis scheme for mammography. The advance of deep learning techniques have made a remarkable progress for this task, provided that the training data are large and sufficiently diverse in terms of image style and quality. In particular, the diversity of image style may be majorly attributed to the vendor factor. However, the collection of mammograms from vendors as many as possible is very expensive and sometimes impractical for laboratory-scale studies. Accordingly, to further augment the generalization capability of deep learning model to various vendors with limited resources, a new contrastive learning scheme is developed. Specifically, the backbone network is firstly trained with a multi-style and multi-view unsupervised self-learning scheme for the embedding of invariant features to various vendor-styles. Afterward, the backbone network is then recalibrated to the downstream task of lesion detection with the specific supervised learning. The proposed method is evaluated with mammograms from four vendors and one unseen public dataset. The experimental results suggest that our approach can effectively improve detection performance on both seen and unseen domains, and outperforms many state-of-the-art (SOTA) generalization methods.
IVMay 25, 2020
mr2NST: Multi-Resolution and Multi-Reference Neural Style Transfer for MammographySheng Wang, Jiayu Huo, Xi Ouyang et al.
Computer-aided diagnosis with deep learning techniques has been shown to be helpful for the diagnosis of the mammography in many clinical studies. However, the image styles of different vendors are very distinctive, and there may exist domain gap among different vendors that could potentially compromise the universal applicability of one deep learning model. In this study, we explicitly address style variety issue with the proposed multi-resolution and multi-reference neural style transfer (mr2NST) network. The mr2NST can normalize the styles from different vendors to the same style baseline with very high resolution. We illustrate that the image quality of the transferred images is comparable to the quality of original images of the target domain (vendor) in terms of NIMA scores. Meanwhile, the mr2NST results are also shown to be helpful for the lesion detection in mammograms.
IVMay 19, 2020
A Self-ensembling Framework for Semi-supervised Knee Cartilage Defects Assessment with Dual-ConsistencyJiayu Huo, Liping Si, Xi Ouyang et al.
Knee osteoarthritis (OA) is one of the most common musculoskeletal disorders and requires early-stage diagnosis. Nowadays, the deep convolutional neural networks have achieved greatly in the computer-aided diagnosis field. However, the construction of the deep learning models usually requires great amounts of annotated data, which is generally high-cost. In this paper, we propose a novel approach for knee cartilage defects assessment, including severity classification and lesion localization. This can be treated as a subtask of knee OA diagnosis. Particularly, we design a self-ensembling framework, which is composed of a student network and a teacher network with the same structure. The student network learns from both labeled data and unlabeled data and the teacher network averages the student model weights through the training course. A novel attention loss function is developed to obtain accurate attention masks. With dual-consistency checking of the attention in the lesion classification and localization, the two networks can gradually optimize the attention distribution and improve the performance of each other, whereas the training relies on partially labeled data only and follows the semi-supervised manner. Experiments show that the proposed method can significantly improve the self-ensembling performance in both knee cartilage defects classification and localization, and also greatly reduce the needs of annotated data.
CVMay 6, 2020
Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired PneumoniaXi Ouyang, Jiayu Huo, Liming Xia et al.
The coronavirus disease (COVID-19) is rapidly spreading all over the world, and has infected more than 1,436,000 people in more than 200 countries and territories as of April 9, 2020. Detecting COVID-19 at early stage is essential to deliver proper healthcare to the patients and also to protect the uninfected population. To this end, we develop a dual-sampling attention network to automatically diagnose COVID- 19 from the community acquired pneumonia (CAP) in chest computed tomography (CT). In particular, we propose a novel online attention module with a 3D convolutional network (CNN) to focus on the infection regions in lungs when making decisions of diagnoses. Note that there exists imbalanced distribution of the sizes of the infection regions between COVID-19 and CAP, partially due to fast progress of COVID-19 after symptom onset. Therefore, we develop a dual-sampling strategy to mitigate the imbalanced learning. Our method is evaluated (to our best knowledge) upon the largest multi-center CT data for COVID-19 from 8 hospitals. In the training-validation stage, we collect 2186 CT scans from 1588 patients for a 5-fold cross-validation. In the testing stage, we employ another independent large-scale testing dataset including 2796 CT scans from 2057 patients. Results show that our algorithm can identify the COVID-19 images with the area under the receiver operating characteristic curve (AUC) value of 0.944, accuracy of 87.5%, sensitivity of 86.9%, specificity of 90.1%, and F1-score of 82.0%. With this performance, the proposed algorithm could potentially aid radiologists with COVID-19 diagnosis from CAP, especially in the early stage of the COVID-19 outbreak.
CVApr 5, 2018
Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and BeyondXi Ouyang, Yu Cheng, Yifan Jiang et al.
State-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors' performance.
NINov 7, 2017
ZipNet-GAN: Inferring Fine-grained Mobile Traffic Patterns via a Generative Adversarial Neural NetworkChaoyun Zhang, Xi Ouyang, Paul Patras
Large-scale mobile traffic analytics is becoming essential to digital infrastructure provisioning, public transportation, events planning, and other domains. Monitoring city-wide mobile traffic is however a complex and costly process that relies on dedicated probes. Some of these probes have limited precision or coverage, others gather tens of gigabytes of logs daily, which independently offer limited insights. Extracting fine-grained patterns involves expensive spatial aggregation of measurements, storage, and post-processing. In this paper, we propose a mobile traffic super-resolution technique that overcomes these problems by inferring narrowly localised traffic consumption from coarse measurements. We draw inspiration from image processing and design a deep-learning architecture tailored to mobile networking, which combines Zipper Network (ZipNet) and Generative Adversarial neural Network (GAN) models. This enables to uniquely capture spatio-temporal relations between traffic volume snapshots routinely monitored over broad coverage areas (`low-resolution') and the corresponding consumption at 0.05 km $^2$ level (`high-resolution') usually obtained after intensive computation. Experiments we conduct with a real-world data set demonstrate that the proposed ZipNet(-GAN) infers traffic consumption with remarkable accuracy and up to 100$\times$ higher granularity as compared to standard probing, while outperforming existing data interpolation techniques. To our knowledge, this is the first time super-resolution concepts are applied to large-scale mobile traffic analysis and our solution is the first to infer fine-grained urban traffic patterns from coarse aggregates.