Agglomerative Clustering of Handwritten Numerals to Determine Similarity of Different Languages
This research provides a method for linguists and historians to infer relationships between languages based on the visual characteristics of their handwritten numerals, offering insights into regional origins.
This paper explores the similarity of handwritten numerals across different languages to infer regional resemblances and potential shared linguistic origins. It achieves this by constructing a similarity measure using a Siamese network and a novel random sample with replacement similarity averaging technique, followed by agglomerative clustering of numeral datasets.
Handwritten numerals of different languages have various characteristics. Similarities and dissimilarities of the languages can be measured by analyzing the extracted features of the numerals. Handwritten numeral datasets are available and accessible for many renowned languages of different regions. In this paper, several handwritten numeral datasets of different languages are collected. Then they are used to find the similarity among those written languages through determining and comparing the similitude of each handwritten numerals. This will help to find which languages have the same or adjacent parent language. Firstly, a similarity measure of two numeral images is constructed with a Siamese network. Secondly, the similarity of the numeral datasets is determined with the help of the Siamese network and a new random sample with replacement similarity averaging technique. Finally, an agglomerative clustering is done based on the similarities of each dataset. This clustering technique shows some very interesting properties of the datasets. The property focused in this paper is the regional resemblance of the datasets. By analyzing the clusters, it becomes easy to identify which languages are originated from similar regions.