CV LGJan 1, 2025

DDD: Discriminative Difficulty Distance for plant disease diagnosis

Yuji Arima, Satoshi Kagiwada, Hitoshi Iyatomi

arXiv:2501.00734v13.62 citationsh-index: 29

Originality Incremental advance

AI Analysis

This addresses data diversity issues in plant disease diagnosis for agricultural applications, representing an incremental improvement with a novel metric.

The study tackled the problem of overestimated diagnostic performance in plant disease classification due to inappropriate data partitioning by proposing Discriminative Difficulty Distance (DDD), a metric to quantify domain gaps and classification difficulty, which showed correlations with diagnosis difficulty increasing by up to 0.485 and reaching a maximum of 0.909.

Recent studies on plant disease diagnosis using machine learning (ML) have highlighted concerns about the overestimated diagnostic performance due to inappropriate data partitioning, where training and test datasets are derived from the same source (domain). Plant disease diagnosis presents a challenging classification task, characterized by its fine-grained nature, vague symptoms, and the extensive variability of image features within each domain. In this study, we propose the concept of Discriminative Difficulty Distance (DDD), a novel metric designed to quantify the domain gap between training and test datasets while assessing the classification difficulty of test data. DDD provides a valuable tool for identifying insufficient diversity in training data, thus supporting the development of more diverse and robust datasets. We investigated multiple image encoders trained on different datasets and examined whether the distances between datasets, measured using low-dimensional representations generated by the encoders, are suitable as a DDD metric. The study utilized 244,063 plant disease images spanning four crops and 34 disease classes collected from 27 domains. As a result, we demonstrated that even if the test images are from different crops or diseases than those used to train the encoder, incorporating them allows the construction of a distance measure for a dataset that strongly correlates with the difficulty of diagnosis indicated by the disease classifier developed independently. Compared to the base encoder, pre-trained only on ImageNet21K, the correlation higher by 0.106 to 0.485, reaching a maximum of 0.909.

View on arXiv PDF

Similar