Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes
This work addresses low-resource phoneme recognition across multiple languages, offering incremental improvements through multi-task learning.
The paper tackled cross-lingual phoneme recognition by proposing Allophant, a multilingual phoneme recognizer that uses a phoneme inventory for low-resource transfer, achieving an 11 percentage point improvement in phoneme error rate on supervised languages and a 2.63 percentage point decrease in zero-shot transfer.
This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the PHOIBLE database. When combined with a distance based mapping approach for grapheme-to-phoneme outputs, it allows us to train on PHOIBLE inventories directly. By training and evaluating on 34 languages, we found that the addition of multi-task learning improves the model's capability of being applied to unseen phonemes and phoneme inventories. On supervised languages we achieve phoneme error rate improvements of 11 percentage points (pp.) compared to a baseline without multi-task learning. Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of 2.63 pp. over the baseline.