CVLGJan 17, 2022

OmniPrint: A Configurable Printed Character Synthesizer

arXiv:2201.06648v17 citationsHas Code
AI Analysis

This provides a flexible data synthesis tool for ML researchers, but it is incremental as it builds on existing datasets like MNIST and Omniglot.

The authors tackled the need for diverse synthetic data in machine learning by introducing OmniPrint, a configurable generator for printed characters that supports multiple languages, fonts, and distortions, resulting in a tool with 935 fonts from 27 scripts and demonstrated use cases like meta-learning datasets.

We introduce OmniPrint, a synthetic data generator of isolated printed characters, geared toward machine learning research. It draws inspiration from famous datasets such as MNIST, SVHN and Omniglot, but offers the capability of generating a wide variety of printed characters from various languages, fonts and styles, with customized distortions. We include 935 fonts from 27 scripts and many types of distortions. As a proof of concept, we show various use cases, including an example of meta-learning dataset designed for the upcoming MetaDL NeurIPS 2021 competition. OmniPrint is available at https://github.com/SunHaozhe/OmniPrint.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes