CLSep 30, 2021

Phonetic Word Embeddings

Rahul Sharma, Kunal Dhawan, Balakrishna Pailla

arXiv:2109.14796v11.08 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of phonetic similarity for computational phonology tasks, offering an incremental improvement with new benchmarking tools.

The paper tackles the problem of phonetic similarity between words by introducing a novel methodology that learns continuous vector embeddings grouping similar-sounding words, achieving performance gains over previous works on established tests for English and Hindi. It also introduces a heterographic pun dataset for benchmarking acoustic similarity algorithms.

This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds. This metric is employed to learn a continuous vector embedding space that groups similar sounding words together and can be used for various downstream computational phonology tasks. The efficacy of the method is presented for two different languages (English, Hindi) and performance gains over previous reported works are discussed on established tests for predicting phonetic similarity. To address limited benchmarking mechanisms in this field, we also introduce a heterographic pun dataset based evaluation methodology to compare the effectiveness of acoustic similarity algorithms. Further, a visualization of the embedding space is presented with a discussion on the various possible use-cases of this novel algorithm. An open-source implementation is also shared to aid reproducibility and enable adoption in related tasks.

View on arXiv PDF Code

Similar