Handwriting styles: benchmarks and evaluation metrics
This work addresses the need for personalized human-computer interaction systems by establishing benchmarks for handwriting style evaluation, but it is incremental as it focuses on setting up evaluation frameworks rather than novel generation methods.
The paper tackles the problem of evaluating handwriting style generation, which is challenging due to its ill-defined nature, by proposing baseline benchmarks and evaluation metrics using the IRON-OFF dataset, with no prior work identified in this area.
Evaluating the style of handwriting generation is a challenging problem, since it is not well defined. It is a key component in order to develop in developing systems with more personalized experiences with humans. In this paper, we propose baseline benchmarks, in order to set anchors to estimate the relative quality of different handwriting style methods. This will be done using deep learning techniques, which have shown remarkable results in different machine learning tasks, learning classification, regression, and most relevant to our work, generating temporal sequences. We discuss the challenges associated with evaluating our methods, which is related to evaluation of generative models in general. We then propose evaluation metrics, which we find relevant to this problem, and we discuss how we evaluate the evaluation metrics. In this study, we use IRON-OFF dataset. To the best of our knowledge, there is no work done before in generating handwriting (either in terms of methodology or the performance metrics), our in exploring styles using this dataset.