CLLGJul 24, 2022

Composing RNNs and FSTs for Small Data: Recovering Missing Characters in Old Hawaiian Text

Oxford
arXiv:2208.10248v12 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the laborious task of manual transliteration for Hawaiian language preservation, though it is incremental as it adapts existing methods to a small-data scenario.

The paper tackled the problem of automatically transliterating old Hawaiian text to modern orthography, which includes missing characters for long vowels and glottal stops, by introducing a hybrid method that composes finite state transducers (FSTs) with recurrent neural networks (RNNs), outperforming an end-to-end FST approach.

In contrast to the older writing system of the 19th century, modern Hawaiian orthography employs characters for long vowels and glottal stops. These extra characters account for about one-third of the phonemes in Hawaiian, so including them makes a big difference to reading comprehension and pronunciation. However, transliterating between older and newer texts is a laborious task when performed manually. We introduce two related methods to help solve this transliteration problem automatically, given that there were not enough data to train an end-to-end deep learning model. One method is implemented, end-to-end, using finite state transducers (FSTs). The other is a hybrid deep learning approach which approximately composes an FST with a recurrent neural network (RNN). We find that the hybrid approach outperforms the end-to-end FST by partitioning the original problem into one part that can be modelled by hand, using an FST, and into another part, which is easily solved by an RNN trained on the available data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes