IVAICVSep 21, 2025

A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories

arXiv:2509.17046v25 citationsh-index: 5Sci Data
Originality Synthesis-oriented
AI Analysis

This dataset addresses the problem of limited public benchmarks for breast ultrasound AI, particularly for rare cases, though it is incremental as it builds on existing dataset creation efforts.

The authors tackled the lack of high-quality breast ultrasound datasets for AI development by introducing BUS-CoT, a dataset with 11,439 images covering all 99 histopathology types, annotated with chain-of-thought reasoning processes to support robust AI systems.

Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 images of 10,019 lesions from 4,838 patients and covers all 99 histopathology types. To facilitate research on incentivizing CoT reasoning, we construct the reasoning processes based on observation, feature, diagnosis and pathology labels, annotated and verified by experienced experts. Moreover, by covering lesions of all histopathology types, we aim to facilitate robust AI systems in rare cases, which can be error-prone in clinical practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes