Video-based Hierarchical Species Classification for Longline Fishing Monitoring
This work provides a more efficient and accurate method for fish species identification from video, which is crucial for electronic monitoring of longline fishing and reducing the laborious efforts of human reviewers.
This paper addresses the challenge of identifying fish species from video footage of longline fishing, where fish are often deformed or occluded. The authors propose a hierarchical classification method that significantly outperforms traditional flat classification systems, leveraging a known non-overlapping hierarchical data structure provided by fisheries scientists.
The goal of electronic monitoring (EM) of longline fishing is to monitor the fish catching activities on fishing vessels, either for the regulatory compliance or catch counting. Hierarchical classification based on videos allows for inexpensive and efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the catching process. More importantly, the flexibility of hierarchical classification mitigates the laborious efforts of human reviews by providing confidence scores in different hierarchical levels. Some related works either use cascaded models for hierarchical classification or make predictions per image or predict one overlapping hierarchical data structure of the dataset in advance. However, with a known non-overlapping hierarchical data structure provided by fisheries scientists, our method enforces the hierarchical data structure and introduces an efficient training and inference strategy for video-based fisheries data. Our experiments show that the proposed method outperforms the classic flat classification system significantly and our ablation study justifies our contributions in CNN model design, training strategy, and the video-based inference schemes for the hierarchical fish species classification task.