LGAug 18, 2016

Caveats on Bayesian and hidden-Markov models (v2.8)

arXiv:1608.05277v31.0

Originality Synthesis-oriented

AI Analysis

It addresses practical problems in handwriting recognition for researchers and practitioners, though it is incremental in highlighting known limitations.

The paper identifies fundamental issues in applying hidden-Markov and Bayesian models to cursive-script recognition, such as error propagation in probability products and the appropriateness of the Markov assumption, and presents a non-Bayesian, non-Markov method achieving very acceptable results with minimal training.

This paper describes a number of fundamental and practical problems in the application of hidden-Markov models and Bayes when applied to cursive-script recognition. Several problems, however, will have an effect in other application areas. The most fundamental problem is the propagation of error in the product of probabilities. This is a common and pervasive problem which deserves more attention. On the basis of Monte Carlo modeling, tables for the expected relative error are given. It seems that it is distributed according to a continuous Poisson distribution over log probabilities. A second essential problem is related to the appropriateness of the Markov assumption. Basic tests will reveal whether a problem requires modeling of the stochastics of seriality, at all. Examples are given of lexical encodings which cover 95-99% classification accuracy of a lexicon, with removed sequence information, for several European languages. Finally, a summary of results on a non- Bayes, non-Markov method in handwriting recognition are presented, with very acceptable results and minimal modeling or training requirements using nearest-mean classification.

View on arXiv PDF

Similar