CVAug 25, 2025

LPLC: A Dataset for License Plate Legibility Classification

Lucas Wojcik, Gabriel E. Lima, Valfride Nascimento, Eduil Nascimento, Rayson Laroca, David Menotti

arXiv:2508.18425v28.42 citationsh-index: 32Has CodeSIBGRAPI

Originality Synthesis-oriented

AI Analysis

This addresses a domain-specific challenge in automatic license plate recognition by providing a dataset for selective image pre-processing, but it is incremental as it focuses on data creation and benchmarking.

The authors tackled the problem of classifying license plate legibility to optimize automatic recognition by introducing the LPLC dataset with 10,210 images and 12,687 annotated plates, and benchmarked it with baseline models achieving F1 scores below 80%.

Automatic License Plate Recognition (ALPR) faces a major challenge when dealing with illegible license plates (LPs). While reconstruction methods such as super-resolution (SR) have emerged, the core issue of recognizing these low-quality LPs remains unresolved. To optimize model performance and computational efficiency, image pre-processing should be applied selectively to cases that require enhanced legibility. To support research in this area, we introduce a novel dataset comprising 10,210 images of vehicles with 12,687 annotated LPs for legibility classification (the LPLC dataset). The images span a wide range of vehicle types, lighting conditions, and camera/image quality levels. We adopt a fine-grained annotation strategy that includes vehicle- and LP-level occlusions, four legibility categories (perfect, good, poor, and illegible), and character labels for three categories (excluding illegible LPs). As a benchmark, we propose a classification task using three image recognition networks to determine whether an LP image is good enough, requires super-resolution, or is completely unrecoverable. The overall F1 score, which remained below 80% for all three baseline models (ViT, ResNet, and YOLO), together with the analyses of SR and LP recognition methods, highlights the difficulty of the task and reinforces the need for further research. The proposed dataset is publicly available at https://github.com/lmlwojcik/lplc-dataset.

View on arXiv PDF Code

Similar