SDAIASJun 10, 2024

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

arXiv:2406.06786v223 citations
Originality Incremental advance
AI Analysis

This work addresses respiratory sound classification for medical diagnostics, but it is incremental as it builds on existing multimodal approaches by incorporating metadata.

The paper tackled the challenge of respiratory sound classification by introducing a text-audio multimodal model that leverages metadata such as patient demographics and recording details, achieving state-of-the-art performance on the ICBHI dataset with a 1.17% improvement over previous methods.

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes