A Multimodal Data Fusion Generative Adversarial Network for Real Time Underwater Sound Speed Field Construction
This addresses the challenge of high-precision underwater sound speed field construction for applications like acoustic communication and positioning, without the need for costly on-site data collection.
The paper tackles the problem of estimating underwater sound speed profiles without requiring on-site sonar measurements by proposing a multimodal data-fusion generative adversarial network with residual attention blocks (MDF-RAGAN). The model achieves an accuracy with error less than 0.3m/s, outperforming CNN and spatial interpolation by nearly a factor of two and reducing root mean square error by about 65.8% compared to mean profile.
Sound speed profiles (SSPs) are essential parameters underwater that affects the propagation mode of underwater signals and has a critical impact on the energy efficiency of underwater acoustic communication and accuracy of underwater acoustic positioning. Traditionally, SSPs can be obtained by matching field processing (MFP), compressive sensing (CS), and deep learning (DL) methods. However, existing methods mainly rely on on-site underwater sonar observation data, which put forward strict requirements on the deployment of sonar observation systems. To achieve high-precision estimation of sound velocity distribution in a given sea area without on-site underwater data measurement, we propose a multi-modal data-fusion generative adversarial network model with residual attention block (MDF-RAGAN) for SSP construction. To improve the model's ability for capturing global spatial feature correlations, we embedded the attention mechanisms, and use residual modules for deeply capturing small disturbances in the deep ocean sound velocity distribution caused by changes of SST. Experimental results on real open dataset show that the proposed model outperforms other state-of-the-art methods, which achieves an accuracy with an error of less than 0.3m/s. Specifically, MDF-RAGAN not only outperforms convolutional neural network (CNN) and spatial interpolation (SITP) by nearly a factor of two, but also achieves about 65.8\% root mean square error (RMSE) reduction compared to mean profile, which fully reflects the enhancement of overall profile matching by multi-source fusion and cross-modal attention.