CVOct 27, 2021

Vision Transformer for Classification of Breast Ultrasound Images

arXiv:2110.14731v313.5108 citations

Originality Synthesis-oriented

AI Analysis

This addresses breast cancer diagnosis for medical imaging, but it is incremental as it applies an existing ViT method to a new domain.

The study tackled breast cancer classification from ultrasound images using Vision Transformers (ViT) for the first time, achieving comparable or better accuracy and AUC than state-of-the-art CNNs.

Medical ultrasound (US) imaging has become a prominent modality for breast cancer imaging due to its ease-of-use, low-cost and safety. In the past decade, convolutional neural networks (CNNs) have emerged as the method of choice in vision applications and have shown excellent potential in automatic classification of US images. Despite their success, their restricted local receptive field limits their ability to learn global context information. Recently, Vision Transformer (ViT) designs that are based on self-attention between image patches have shown great potential to be an alternative to CNNs. In this study, for the first time, we utilize ViT to classify breast US images using different augmentation strategies. The results are provided as classification accuracy and Area Under the Curve (AUC) metrics, and the performance is compared with the state-of-the-art CNNs. The results indicate that the ViT models have comparable efficiency with or even better than the CNNs in classification of US breast images.

View on arXiv PDF

Similar