Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges
This work addresses a practical challenge in document processing for applications like postal services and banking, though it is incremental as it focuses on dataset creation rather than novel method development.
The paper tackles the problem of multi-digit number recognition in realistic handwritten settings, such as ZIP Codes and check amounts, by creating benchmark datasets that reflect same-writer sequences, and finds that classifiers perform poorly on these tasks despite excelling at isolated digit classification.
Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.