CVNov 17, 2022

TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos

arXiv:2211.09950v13 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the need for automated behavior analysis in marine biology, offering a generic system that could be extended to other behaviors, though it is incremental as it builds on existing video detection methods.

The paper tackles the problem of automatically detecting animal behaviors in underwater videos, proposing TempNet, a deep learning method that achieves 80% accuracy and 0.81 precision in detecting sablefish startle events, with a 31% relative improvement in accuracy over baselines.

Recent advancements in cabled ocean observatories have increased the quality and prevalence of underwater videos; this data enables the extraction of high-level biologically relevant information such as species' behaviours. Despite this increase in capability, most modern methods for the automatic interpretation of underwater videos focus only on the detection and counting organisms. We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. TempNet also presents temporal attention during spatial encoding as well as Wavelet Down-Sampling pre-processing to improve model accuracy. Although our system is designed for applications to diverse fish behaviours (i.e, is generic), we demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events. We compare the proposed approach with a state-of-the-art end-to-end video detection method (ReMotENet) and a hybrid method previously offered exclusively for the detection of sablefish's startle events in videos from an existing dataset. Results show that our novel method comfortably outperforms the comparison baselines in multiple metrics, reaching a per-clip accuracy and precision of 80% and 0.81, respectively. This represents a relative improvement of 31% in accuracy and 27% in precision over the compared methods using this dataset. Our computational pipeline is also highly efficient, as it can process each 4-second video clip in only 38ms. Furthermore, since it does not employ features specific to sablefish startle events, our system can be easily extended to other behaviours in future works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes