Revisiting the Hierarchical Multiscale LSTM
This work addresses reproducibility and usability challenges for researchers in computational linguistics using complex deep-learning models, though it is incremental as it builds on an existing architecture.
The paper tackled the complexity and reproducibility issues of the Hierarchical Multiscale LSTM language model by conducting a reproduction and ablation study, showing that simplifying aspects of the architecture can improve performance and that learned linguistic units do not correlate with overall model effectiveness.
Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.