CVJul 10, 2025

Dual Semantic-Aware Network for Noise Suppressed Ultrasound Video Segmentation

arXiv:2507.07443v1h-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of noise in ultrasound video segmentation for medical diagnostics, presenting an incremental improvement with novel modules for feature integration.

The paper tackles the problem of noise interference in automated lesion or organ segmentation from ultrasound video sequences by proposing the Dual Semantic-Aware Network (DSANet), which enhances noise robustness through mutual semantic awareness between local and global features, resulting in substantially outperforming state-of-the-art methods in segmentation accuracy and achieving higher inference FPS than video-based methods.

Ultrasound imaging is a prevalent diagnostic tool known for its simplicity and non-invasiveness. However, its inherent characteristics often introduce substantial noise, posing considerable challenges for automated lesion or organ segmentation in ultrasound video sequences. To address these limitations, we propose the Dual Semantic-Aware Network (DSANet), a novel framework designed to enhance noise robustness in ultrasound video segmentation by fostering mutual semantic awareness between local and global features. Specifically, we introduce an Adjacent-Frame Semantic-Aware (AFSA) module, which constructs a channel-wise similarity matrix to guide feature fusion across adjacent frames, effectively mitigating the impact of random noise without relying on pixel-level relationships. Additionally, we propose a Local-and-Global Semantic-Aware (LGSA) module that reorganizes and fuses temporal unconditional local features, which capture spatial details independently at each frame, with conditional global features that incorporate temporal context from adjacent frames. This integration facilitates multi-level semantic representation, significantly improving the model's resilience to noise interference. Extensive evaluations on four benchmark datasets demonstrate that DSANet substantially outperforms state-of-the-art methods in segmentation accuracy. Moreover, since our model avoids pixel-level feature dependencies, it achieves significantly higher inference FPS than video-based methods, and even surpasses some image-based models. Code can be found in \href{https://github.com/ZhouL2001/DSANet}{DSANet}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes