CL MLJan 21, 2022

Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing

arXiv:2201.08919v131.0584 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of capturing hierarchical structures in text for NLP applications, offering an incremental improvement over existing RNN models.

The paper tackled the problem of learning hierarchical representations in natural language processing by proposing the EM-HRNN model, which uses a latent indicator layer and EM algorithm to integrate implicit hierarchical information and attention mechanisms, and it outperformed other RNN-based models in document classification tasks, with performance comparable to Bert-base despite being smaller and not requiring pre-training.

How to obtain hierarchical representations with an increasing level of abstraction becomes one of the key issues of learning with deep neural networks. A variety of RNN models have recently been proposed to incorporate both explicit and implicit hierarchical information in modeling languages in the literature. In this paper, we propose a novel approach called the latent indicator layer to identify and learn implicit hierarchical information (e.g., phrases), and further develop an EM algorithm to handle the latent indicator layer in training. The latent indicator layer further simplifies a text's hierarchical structure, which allows us to seamlessly integrate different levels of attention mechanisms into the structure. We called the resulting architecture as the EM-HRNN model. Furthermore, we develop two bootstrap strategies to effectively and efficiently train the EM-HRNN model on long text documents. Simulation studies and real data applications demonstrate that the EM-HRNN model with bootstrap training outperforms other RNN-based models in document classification tasks. The performance of the EM-HRNN model is comparable to a Transformer-based method called Bert-base, though the former is much smaller model and does not require pre-training.

View on arXiv PDF

Similar