Carlos Garcia

h-index16
2papers

2 Papers

IVMay 10, 2025Code
Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification

Daniel Strick, Carlos Garcia, Anthony Huang

Deep learning for radiologic image analysis is a rapidly growing field in biomedical research and is likely to become a standard practice in modern medicine. On the publicly available NIH ChestX-ray14 dataset, containing X-ray images that are classified by the presence or absence of 14 different diseases, we reproduced an algorithm known as CheXNet, as well as explored other algorithms that outperform CheXNet's baseline metrics. Model performance was primarily evaluated using the F1 score and AUC-ROC, both of which are critical metrics for imbalanced, multi-label classification tasks in medical imaging. The best model achieved an average AUC-ROC score of 0.85 and an average F1 score of 0.39 across all 14 disease classifications present in the dataset.

CLApr 1, 2025
TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet

Luis Felipe, Carlos Garcia, Issam El Naqa et al.

The need for robust and diverse data sets to train clinical large language models (cLLMs) is critical given that currently available public repositories often prove too limited in size or scope for comprehensive medical use. While resources like PubMed provide foundational medical literature, they capture only a narrow range of formal publications and omit the broader medical discourse on the internet. To address these deficits, we introduce TheBlueScrubs-v1, a curated dataset of over 25 billion medical tokens - nearly three times larger than PubMed - drawn from a broad-scale internet corpus. Our two-stage filtering pipeline employs a Logistic Regression model for document screening (achieving an AUC of approximately 0.95 on external validation), followed by verification via a 70B-parameter Llama 3.1 instruct model. Each text is assigned three LLM-based quality scores encompassing medical relevance, precision and factual detail, and safety and ethical standards. Clinician reviews confirm high concordance with these automated evaluations, and a specialized cancer classifier further labels approximately 11 billion oncology tokens. Two demonstration tasks highlight the dataset's practical value: first, we distill the safety evaluations to a smaller BERT-style model that reaches an AUC near 0.96 on unseen data; second, we fine-tune a compact LLM on a filtered subset, showing measurable improvements over standard baselines in medical benchmarks as well as private ones. This Data Descriptor details the dataset's creation and validation, underscoring its potential utility for medical AI research.