Counting Fish with Temporal Representations of Sonar Video
This work addresses the need for accessible fish counting methods for conservation and fishery management, particularly at sites with limited compute and connectivity, though it is incremental as it builds on prior computer vision techniques.
The paper tackled the problem of automated salmon counting from sonar video by proposing a lightweight computer vision method using echograms, achieving a count error of 23% on data from the Kenai River.
Accurate estimates of salmon escapement - the number of fish migrating upstream to spawn - are key data for conservation and fishery management. Existing methods for salmon counting using high-resolution imaging sonar hardware are non-invasive and compatible with computer vision processing. Prior work in this area has utilized object detection and tracking based methods for automated salmon counting. However, these techniques remain inaccessible to many sonar deployment sites due to limited compute and connectivity in the field. We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms - temporal representations that compress several hundred frames of imaging sonar video into a single image. We predict upstream and downstream counts within 200-frame time windows directly from echograms using a ResNet-18 model, and propose a set of domain-specific image augmentations and a weakly-supervised training protocol to further improve results. We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.