Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification
This work addresses CAPTCHA detection for web security applications, but it is incremental as it benchmarks existing YOLO models without introducing new methods.
The paper compared YOLOv5, YOLOv8, and YOLOv10 models for detecting and classifying CAPTCHAs on webpages, finding that nano variants were fastest while more complex architectures performed better on metrics like mAP@50 and F1 score.
This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.