The FathomNet2023 Competition Dataset
This addresses a critical need for ocean scientists and environmental monitoring by enabling better utilization of visual data, though it is incremental as it focuses on dataset creation rather than a new method.
The paper tackles the problem of handling extreme variability and distribution shifts in marine visual data by introducing the FathomNet2023 competition dataset, which challenges models to identify organisms and detect out-of-sample images when target data differs from training data.
Ocean scientists have been collecting visual data to study marine organisms for decades. These images and videos are extremely valuable both for basic science and environmental monitoring tasks. There are tools for automatically processing these data, but none that are capable of handling the extreme variability in sample populations, image quality, and habitat characteristics that are common in visual sampling of the ocean. Such distribution shifts can occur over very short physical distances and in narrow time windows. Creating models that are able to recognize when an image or video sequence contains a new organism, an unusual collection of animals, or is otherwise out-of-sample is critical to fully leverage visual data in the ocean. The FathomNet2023 competition dataset presents a realistic scenario where the set of animals in the target data differs from the training data. The challenge is both to identify the organisms in a target image and assess whether it is out-of-sample.