DongAo Ma

CV
h-index14
3papers
101citations
Novelty48%
AI Score39

3 Papers

CVOct 14, 2023Code
Foundation Ark: Accruing and Reusing Knowledge for Superior and Robust Performance

DongAo Ma, Jiaxuan Pang, Michael B. Gotway et al.

Deep learning nowadays offers expert-level and sometimes even super-expert-level performance, but achieving such performance demands massive annotated data for training (e.g., Google's proprietary CXR Foundation Model (CXR-FM) was trained on 821,544 labeled and mostly private chest X-rays (CXRs)). Numerous datasets are publicly available in medical imaging but individually small and heterogeneous in expert labels. We envision a powerful and robust foundation model that can be trained by aggregating numerous small public datasets. To realize this vision, we have developed Ark, a framework that accrues and reuses knowledge from heterogeneous expert annotations in various datasets. As a proof of concept, we have trained two Ark models on 335,484 and 704,363 CXRs, respectively, by merging several datasets including ChestX-ray14, CheXpert, MIMIC-II, and VinDr-CXR, evaluated them on a wide range of imaging tasks covering both classification and segmentation via fine-tuning, linear-probing, and gender-bias analysis, and demonstrated our Ark's superior and robust performance over the SOTA fully/self-supervised baselines and Google's proprietary CXR-FM. This enhanced performance is attributed to our simple yet powerful observation that aggregating numerous public datasets diversifies patient populations and accrues knowledge from diverse experts, yielding unprecedented performance yet saving annotation cost. With all codes and pretrained models released at GitHub.com/JLiangLab/Ark, we hope that Ark exerts an important impact on open science, as accruing and reusing knowledge from expert annotations in public datasets can potentially surpass the performance of proprietary models trained on unusually large data, inspiring many more researchers worldwide to share codes and datasets to build open foundation models, accelerate open science, and democratize deep learning for medical imaging.

CVMar 12, 2025Code
Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis

Nahid Ul Islam, DongAo Ma, Jiaxuan Pang et al.

Developing robust and versatile deep-learning models is essential for enhancing diagnostic accuracy and guiding clinical interventions in medical imaging, but it requires a large amount of annotated data. The advancement of deep learning has facilitated the creation of numerous medical datasets with diverse expert-level annotations. Aggregating these datasets can maximize data utilization and address the inadequacy of labeled data. However, the heterogeneity of expert-level annotations across tasks such as classification, localization, and segmentation presents a significant challenge for learning from these datasets. To this end, we introduce nFoundation X, an end-to-end framework that utilizes diverse expert-level annotations from numerous public datasets to train a foundation model capable of multiple tasks including classification, localization, and segmentation. To address the challenges of annotation and task heterogeneity, we propose a Lock-Release pretraining strategy to enhance the cyclic learning from multiple datasets, combined with the student-teacher learning paradigm, ensuring the model retains general knowledge for all tasks while preventing overfitting to any single task. To demonstrate the effectiveness of Foundation X, we trained a model using 11 chest X-ray datasets, covering annotations for classification, localization, and segmentation tasks. Our experimental results show that Foundation X achieves notable performance gains through extensive annotation utilization, excels in cross-dataset and cross-task learning, and further enhances performance in organ localization and segmentation tasks. All code and pretrained models are publicly accessible at https://github.com/jlianglab/Foundation_X.

CVSep 13, 2018
SiftingGAN: Generating and Sifting Labeled Samples to Improve the Remote Sensing Image Scene Classification Baseline in vitro

Dongao Ma, Ping Tang, Lijun Zhao

Lack of annotated samples greatly restrains the direct application of deep learning in remote sensing image scene classification. Although researches have been done to tackle this issue by data augmentation with various image transformation operations, they are still limited in quantity and diversity. Recently, the advent of the unsupervised learning based generative adversarial networks (GANs) bring us a new way to generate augmented samples. However, such GAN-generated samples are currently only served for training GANs model itself and for improving the performance of the discriminator in GANs internally (in vivo). It becomes a question of serious doubt whether the GAN-generated samples can help better improve the scene classification performance of other deep learning networks (in vitro), compared with the widely used transformed samples. To answer this question, this paper proposes a SiftingGAN approach to generate more numerous, more diverse and more authentic labeled samples for data augmentation. SiftingGAN extends traditional GAN framework with an Online-Output method for sample generation, a Generative-Model-Sifting method for model sifting, and a Labeled-Sample-Discriminating method for sample sifting. Experiments on the well-known AID dataset demonstrate that the proposed SiftingGAN method can not only effectively improve the performance of the scene classification baseline that is achieved without data augmentation, but also significantly excels the comparison methods based on traditional geometric/radiometric transformation operations.