Open data for Moroccan license plates for OCR applications : data collection, labeling, and model construction
This provides a domain-specific resource for researchers and developers working on OCR applications in traffic management, particularly for regions with special characters, but it is incremental as it focuses on data collection rather than novel methods.
The authors tackled the lack of open datasets for license plate recognition in Morocco, where Arabic characters are used, by creating a labeled dataset of 705 images and demonstrating its utility through model comparisons and data augmentation.
Significant number of researches have been developed recently around intelligent system for traffic management, especially, OCR based license plate recognition, as it is considered as a main step for any automatic traffic management system. Good quality data sets are increasingly needed and produced by the research community to improve the performance of those algorithms. Furthermore, a special need of data is noted for countries having special characters on their licence plates, like Morocco, where Arabic Alphabet is used. In this work, we present a labeled open data set of circulation plates taken in Morocco, for different type of vehicles, namely cars, trucks and motorcycles. This data was collected manually and consists of 705 unique and different images. Furthermore this data was labeled for plate segmentation and for matriculation number OCR. Also, As we show in this paper, the data can be enriched using data augmentation techniques to create training sets with few thousands of images for different machine leaning and AI applications. We present and compare a set of models built on this data. Also, we publish this data as an open access data to encourage innovation and applications in the field of OCR and image processing for traffic control and other applications for transportation and heterogeneous vehicle management.