CVJul 5, 2024

Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

arXiv:2407.04242v18.78 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem in medical imaging for ultrasound reconstruction, with incremental advancements in method design.

The paper tackles the challenge of fine-grained spatio-temporal learning for freehand 3D ultrasound reconstruction by proposing ReMamba, which uses a state space model to manage long-range dependencies, along with adaptive fusion and online alignment strategies. The method achieves remarkable improvement over competitors on two large-scale datasets.

Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.

View on arXiv PDF

Similar