Weakly Supervised Faster-RCNN+FPN to classify animals in camera trap images
This addresses the workload of manual classification for researchers in biodiversity monitoring, though it is incremental as it adapts existing object detection methods to a specific domain.
The paper tackled the problem of automating animal classification in camera trap images where animals occupy small portions of high-definition frames, by proposing a weakly supervised Faster-RCNN+FPN workflow that uses motion from multiple frames for bounding box annotation without manual labels, achieving results on datasets from Papua New Guinea and Missouri biodiversity monitoring campaigns.
Camera traps have revolutionized the animal research of many species that were previously nearly impossible to observe due to their habitat or behavior. They are cameras generally fixed to a tree that take a short sequence of images when triggered. Deep learning has the potential to overcome the workload to automate image classification according to taxon or empty images. However, a standard deep neural network classifier fails because animals often represent a small portion of the high-definition images. That is why we propose a workflow named Weakly Object Detection Faster-RCNN+FPN which suits this challenge. The model is weakly supervised because it requires only the animal taxon label per image but doesn't require any manual bounding box annotations. First, it automatically performs the weakly-supervised bounding box annotation using the motion from multiple frames. Then, it trains a Faster-RCNN+FPN model using this weak supervision. Experimental results have been obtained with two datasets from a Papua New Guinea and Missouri biodiversity monitoring campaign, then on an easily reproducible testbed.