Identifying Cocoa Pollinators: A Deep Learning Dataset
This dataset addresses the need for better pollination monitoring to improve yields in the cocoa industry, but it is incremental as it applies existing deep learning methods to new data.
The authors tackled the problem of limited research on cocoa pollination by creating the first dataset of cocoa flower visitors, containing 5,792 images across five insect families and 1,082 background images, and demonstrated its use with YOLOv8 models, achieving an F1 Score of 0.71 and mAP50 of 0.70 with a medium-sized model and 8% background images.
Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower visitor dataset containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and Encyrtidae, and 1,082 background cocoa flower images. This dataset was curated from 23 million images collected over two years by embedded cameras in cocoa plantations in Hainan province, China. We exemplify the use of the dataset with different sizes of YOLOv8 models and by progressively increasing the background image ratio in the training set to identify the best-performing model. The medium-sized YOLOv8 model achieved the best results with 8% background images (F1 Score of 0.71, mAP50 of 0.70). Overall, this dataset is useful to compare the performance of deep learning model architectures on images with low contrast images and difficult detection targets. The data can support future efforts to advance sustainable cocoa production through pollination monitoring projects.