The parallel texts of books translations in the quality evaluation of basic models and algorithms for the similarity of symbol strings
This provides a reproducible quality evaluation method for string similarity algorithms, which is incremental as it applies existing metrics to a new data context.
The paper tackled the problem of evaluating string similarity metrics by using parallel text translations to rank paragraphs and measure the position of correct translations, finding the most accurate metrics through this objective assessment.
This numeric evaluation of string metric accuracy is based on the following idea: taking the paragraph of text in one language sort all paragraphs of the document in other language by similarity with given paragraph string and consider place of the right translation as the value of the evaluation score. Such a search of proper translation provides an objective and reproducible quality assessment for known similarity metrics and shows the most accurate ones.