CLNov 1, 2023

AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

arXiv:2311.00408v1132 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the inefficiency of domain adaptation for sentence encoders in few-shot classification, offering a practical solution for researchers and practitioners, though it is incremental as it builds on existing pre-training methods.

The paper tackles the problem of domain adaptation for sentence embeddings in few-shot classification by proposing AdaSent, which decouples sentence embedding pre-training from domain-adaptive pre-training using an adapter, achieving comparable performance to full retraining while reducing training costs by up to 8.4 points in accuracy improvements.

Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes