CLJun 7, 2019

Learning Word Embeddings with Domain Awareness

arXiv:1906.03249v33 citations
Originality Incremental advance
AI Analysis

This work addresses domain adaptation issues in NLP for researchers and practitioners, but it is incremental as it builds on existing SG and CBOW models.

The paper tackled the problem of word embeddings performing poorly on heterogeneous domain data by proposing domain-aware training mechanisms, resulting in improved effectiveness especially in near-cold-start scenarios.

Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge. This can lead to unsatisfactory performances when training data originate from heterogeneous domains. In this paper, we propose two novel mechanisms for domain-aware word embedding training, namely domain indicator and domain attention, which integrate domain-specific knowledge into the widely used SG and CBOW models, respectively. The two methods are based on a joint learning paradigm and ensure that words in a target domain are intensively focused when trained on a source domain corpus. Qualitative and quantitative evaluation confirm the validity and effectiveness of our models. Compared to baseline methods, our method is particularly effective in near-cold-start scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes