CLMar 31, 2022

Domain Adaptation for Sparse-Data Settings: What Do We Gain by Not Using Bert?

arXiv:2203.16926v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of high computational costs for domain adaptation in NLP, providing practical guidelines for resource-constrained settings, though it is incremental in comparing existing methods.

The paper tackled the problem of building NLP applications with scarce labeled training data in specific domains, finding that while pre-trained language models like BERT perform best, alternative methods can be trained up to 175K times faster without GPUs, offering significant cost savings with only slightly worse performance.

The practical success of much of NLP depends on the availability of training data. However, in real-world scenarios, training data is often scarce, not least because many application domains are restricted and specific. In this work, we compare different methods to handle this problem and provide guidelines for building NLP applications when there is only a small amount of labeled training data available for a specific domain. While transfer learning with pre-trained language models outperforms other methods across tasks, alternatives do not perform much worse while requiring much less computational effort, thus significantly reducing monetary and environmental cost. We examine the performance tradeoffs of several such alternatives, including models that can be trained up to 175K times faster and do not require a single GPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes