CVJul 25, 2025

YOLO for Knowledge Extraction from Vehicle Images: A Baseline Study

Saraa Al-Saddik, Manna Elizabeth Philip, Ali Haidar

arXiv:2507.18966v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

It provides a robust baseline for extracting vehicle metadata in real-world scenarios, which can help law enforcement and intelligence agencies filter and sort queries more efficiently, though it is incremental as it applies existing methods to new data.

This study tackled the problem of accurately identifying vehicle attributes like make, color, and shape from images for law enforcement and intelligence applications, achieving classification accuracies of up to 93.70% for make and 94.86% for color-binary models using YOLO-based deep learning approaches on a real-world dataset.

Accurate identification of vehicle attributes such as make, colour, and shape is critical for law enforcement and intelligence applications. This study evaluates the effectiveness of three state-of-the-art deep learning approaches YOLO-v11, YOLO-World, and YOLO-Classification on a real-world vehicle image dataset. This dataset was collected under challenging and unconstrained conditions by NSW Police Highway Patrol Vehicles. A multi-view inference (MVI) approach was deployed to enhance the performance of the models' predictions. To conduct the analyses, datasets with 100,000 plus images were created for each of the three metadata prediction tasks, specifically make, shape and colour. The models were tested on a separate dataset with 29,937 images belonging to 1809 number plates. Different sets of experiments have been investigated by varying the models sizes. A classification accuracy of 93.70%, 82.86%, 85.19%, and 94.86% was achieved with the best performing make, shape, colour, and colour-binary models respectively. It was concluded that there is a need to use MVI to get usable models within such complex real-world datasets. Our findings indicated that the object detection models YOLO-v11 and YOLO-World outperformed classification-only models in make and shape extraction. Moreover, smaller YOLO variants perform comparably to larger counterparts, offering substantial efficiency benefits for real-time predictions. This work provides a robust baseline for extracting vehicle metadata in real-world scenarios. Such models can be used in filtering and sorting user queries, minimising the time required to search large vehicle images datasets.

View on arXiv PDF

Similar