CLJan 9, 2016

Empirical Gaussian priors for cross-lingual transfer learning

arXiv:1601.02166v1
Originality Incremental advance
AI Analysis

This addresses the challenge of noisy data in cross-lingual NLP for researchers and practitioners, though it is incremental as it builds on existing regularization methods.

The paper tackled the problem of overfitting in cross-lingual part-of-speech tagging due to noisy training data from projected labels, by proposing empirical Gaussian priors estimated from source language models, which resulted in significantly better performance in multi-source transfer setups.

Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-of-speech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in $k$ languages for which we have labeled data, via word alignments. Our training data is therefore very noisy, and if Rademacher complexity is high, learning algorithms are prone to overfit. Norm-based regularization assumes a constant width and zero mean prior. We instead propose to use the $k$ source language models to estimate the parameters of a Gaussian prior for learning new POS taggers. This leads to significantly better performance in multi-source transfer set-ups. We also present a drop-out version that injects (empirical) Gaussian noise during online learning. Finally, we note that using empirical Gaussian priors leads to much lower Rademacher complexity, and is superior to optimally weighted model interpolation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes