CVApr 9

LPLCv2: An Expanded Dataset for Fine-Grained License Plate Legibility Classification

Lucas Wojcik, Eduardo A. F. Machoski, Eduil Nascimento, Rayson Laroca, David Menotti

arXiv:2604.0874170.91 citationsh-index: 17Has Code

Predicted impact top 38% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of improving license plate recognition in real-world scenarios for ALPR systems, though it is incremental as it builds on an existing benchmark.

The authors tackled the challenge of fine-grained license plate legibility classification by expanding a benchmark dataset to over three times its original size with revised annotations and novel labels, enabling a baseline model to achieve an 89.5% F1-score on the test set, surpassing previous state-of-the-art results.

Modern Automatic License Plate Recognition (ALPR) systems achieve outstanding performance in controlled, well-defined scenarios. However, large-scale real-world usage remains challenging due to low-quality imaging devices, compression artifacts, and suboptimal camera installation. Identifying illegible license plates (LPs) has recently become feasible through a dedicated benchmark; however, its impact has been limited by its small size and annotation errors. In this work, we expand the original benchmark to over three times the size with two extra capture days, revise its annotations and introduce novel labels. LP-level annotations include bounding boxes, text, and legibility level, while vehicle-level annotations comprise make, model, type, and color. Image-level annotations feature camera identity, capture conditions (e.g., rain and faulty cameras), acquisition time, and day ID. We present a novel training procedure featuring an Exponential Moving Average-based loss function and a refined learning rate scheduler, addressing common mistakes in testing. These improvements enable a baseline model to achieve an 89.5% F1-score on the test set, considerably surpassing the previous state of the art. We further introduce a novel protocol to explicitly addresses camera contamination between training and evaluation splits, where results show a small impact. Dataset and code are publicly available at https://github.com/lmlwojcik/LPLCv2-Dataset.

View on arXiv PDF Code

Similar