ReLeaf: Benchmarking Leaf Segmentation across Domains and Species
For researchers in precision agriculture, this work provides a benchmark and baseline for leaf segmentation, though the performance drop across domains highlights remaining challenges.
The paper addresses the lack of leaf-level segmentation datasets and systematic evaluations for precision agriculture. By combining four existing datasets and introducing a new benchmark with 23 species, a YOLO-based model achieves 83.9% mAP50-95 on existing test sets and 40.2% on the new benchmark, showing improved generalization but significant performance drops across domains.
Rising global food demand and growing climate pressure increase the need for sustainable, precise agricultural practices. Automated, individualized plant treatment relies on fine-grained visual analysis, yet leaf-level segmentation remains underexplored despite its value for assessing crop health, growth dynamics, yield potential and localized stress symptoms. Progress is limited by a lack of dedicated datasets, especially regarding species coverage, and by the absence of systematic evaluations of modern instance-segmentation architectures for this task. We address these gaps by surveying current data and identifying four suitable, publicly available leaf-segmentation datasets. Using them, we compare one-stage, two-stage and Transformer-based detectors and identify a YOLO26 model configuration to provide the best trade-off for real-world precision-agriculture tasks. Extensive cross-domain generalization experiments reveal substantial performance drops across plant species and recording setups, especially for models trained solely on laboratory data. To strengthen data availability, we introduce a new benchmark dataset with leaf-level masks for 23 plant species, created via semi-automatic annotation of selected CropAndWeed images. A model trained on all four existing datasets achieves a mean mAP50-95 of 83.9% across their corresponding test sets and 40.2% on our new benchmark, demonstrating improved generalization and highlighting the need for diverse leaf-segmentation datasets in robust precision agriculture.