CVAILGNov 26, 2025

BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model

arXiv:2511.20956v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of limited datasets and hallucinations in automated breast ultrasound reporting for medical professionals, though it is incremental as it builds on existing vision-language methods.

The paper tackles automated radiology report generation for breast ultrasound by proposing BUSTR, a multitask vision-language framework that generates reports without paired image-report supervision, improving standard metrics and clinical efficacy across two datasets.

Automated radiology report generation (RRG) for breast ultrasound (BUS) is limited by the lack of paired image-report datasets and the risk of hallucinations from large language models. We propose BUSTR, a multitask vision-language framework that generates BUS reports without requiring paired image-report supervision. BUSTR constructs reports from structured descriptors (e.g., BI-RADS, pathology, histology) and radiomics features, learns descriptor-aware visual representations with a multi-head Swin encoder trained using a multitask loss over dataset-specific descriptor sets, and aligns visual and textual tokens via a dual-level objective that combines token-level cross-entropy with a cosine-similarity alignment loss between input and output representations. We evaluate BUSTR on two public BUS datasets, BrEaST and BUS-BRA, which differ in size and available descriptors. Across both datasets, BUSTR consistently improves standard natural language generation metrics and clinical efficacy metrics, particularly for key targets such as BI-RADS category and pathology. Our results show that this descriptor-aware vision model, trained with a combined token-level and alignment loss, improves both automatic report metrics and clinical efficacy without requiring paired image-report data. The source code can be found at https://github.com/AAR-UNLV/BUSTR

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes