Neural Proto-Language Reconstruction
This work addresses the painstaking process of proto-language reconstruction for linguists, but it is incremental as it builds on existing methods like RNNs and Transformers.
The paper tackled the problem of automating proto-form reconstruction in linguistics by improving computational models, resulting in better performance on the WikiHan dataset and stabilized training with a VAE-enhanced Transformer and data augmentation.
Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.