CVAIMar 2, 2025

MTReD: 3D Reconstruction Dataset for Fly-over Videos of Maritime Domain

arXiv:2503.00853v12 citationsh-index: 4Has Code2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for researchers in maritime 3D reconstruction by providing a new dataset and metric, though it is incremental as it builds on existing methods like SfM and MASt3R.

The authors tackled the lack of a dataset for 3D scene reconstruction from fly-over videos in the maritime domain by introducing MTReD, a dataset of 19 videos with ships, islands, and coastlines, and proposed a new semantic similarity metric called DiFPS that improves evaluation over existing perception-based metrics.

This work tackles 3D scene reconstruction for a video fly-over perspective problem in the maritime domain, with a specific emphasis on geometrically and visually sound reconstructions. This will allow for downstream tasks such as segmentation, navigation, and localization. To our knowledge, there is no dataset available in this domain. As such, we propose a novel maritime 3D scene reconstruction benchmarking dataset, named as MTReD (Maritime Three-Dimensional Reconstruction Dataset). The MTReD comprises 19 fly-over videos curated from the Internet containing ships, islands, and coastlines. As the task is aimed towards geometrical consistency and visual completeness, the dataset uses two metrics: (1) Reprojection error; and (2) Perception based metrics. We find that existing perception-based metrics, such as Learned Perceptual Image Patch Similarity (LPIPS), do not appropriately measure the completeness of a reconstructed image. Thus, we propose a novel semantic similarity metric utilizing DINOv2 features coined DiFPS (DinoV2 Features Perception Similarity). We perform initial evaluation on two baselines: (1) Structured from Motion (SfM) through Colmap; and (2) the recent state-of-the-art MASt3R model. We find that the reconstructed scenes by MASt3R have higher reprojection errors, but superior perception based metric scores. To this end, some pre-processing methods are explored, and we find a pre-processing method which improves both the reprojection error and perception-based score. We envisage our proposed MTReD to stimulate further research in these directions. The dataset and all the code will be made available in https://github.com/RuiYiYong/MTReD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes