hYOLO Model: Enhancing Object Classification with Hierarchical Context in YOLOv8
This work addresses the problem of improving object classification accuracy by incorporating hierarchical context for computer vision applications, representing an incremental advancement over existing YOLO models.
The paper tackles the problem of flat object classification in CNNs by proposing a hierarchical model based on YOLOv8, introducing a novel architecture, loss function, and metric to leverage object hierarchies, and shows it addresses hierarchical structures overlooked by conventional methods.
Current convolution neural network (CNN) classification methods are predominantly focused on flat classification which aims solely to identify a specified object within an image. However, real-world objects often possess a natural hierarchical organization that can significantly help classification tasks. Capturing the presence of relations between objects enables better contextual understanding as well as control over the severity of mistakes. Considering these aspects, this paper proposes an end-to-end hierarchical model for image detection and classification built upon the YOLO model family. A novel hierarchical architecture, a modified loss function, and a performance metric tailored to the hierarchical nature of the model are introduced. The proposed model is trained and evaluated on two different hierarchical categorizations of the same dataset: a systematic categorization that disregards visual similarities between objects and a categorization accounting for common visual characteristics across classes. The results illustrate how the suggested methodology addresses the inherent hierarchical structure present in real-world objects, which conventional flat classification algorithms often overlook.