CVMay 21

Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection

arXiv:2605.229423.8
AI Analysis

For the specific task of vision-to-chart buoy association in maritime navigation, this incremental modification improves performance over a baseline DETR-based fusion transformer.

This work improves buoy association from chart to image by explicitly predicting buoy pixel coordinates via a learned MLP (QueryMLP) from world-space and IMU data, reducing the geometric reasoning burden on the transformer decoder. The method achieves an Overall score of 0.7386 (F1=0.8055, mIoU=0.6718) on the MaCVi 2026 challenge test set, ranking second.

This report presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. The challenge baseline decoder receives per-buoy queries encoding world-space distance and bearing, forcing the transformer to implicitly learn the complex geometric projection from world coordinates to image pixels. Instead, this work trains an additional dedicated MLP, QueryMLP, to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. On the challenge leaderboard, the presented approach achieves an Overall score of 0.7386, with F1 = 0.8055 and mIoU = 0.6718, on the held-out test set, placing second among all submissions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes