GWPT: A Green Word-Embedding-based POS Tagger
This provides a more efficient POS tagger for NLP applications, though it is incremental as it builds on existing green learning methodology.
The paper tackles part-of-speech tagging by proposing GWPT, a lightweight tagger based on word embeddings, which achieves state-of-the-art accuracies with fewer parameters and lower computational complexity compared to deep learning methods.
As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods.