Neural Structural Correspondence Learning for Domain Adaptation
This work addresses the problem of adapting NLP models across domains with limited labeled data, offering an incremental improvement over existing methods.
The paper tackles domain adaptation in NLP by introducing a neural network model that combines structural correspondence learning and autoencoders to learn low-dimensional representations from nonpivot features, improving cross-domain sentiment classification. It outperforms SCL and MSDA methods by 3.77% and 2.17% on average across 12 domain pairs.
Domain adaptation, adapting models from domains rich in labeled training data to domains poor in such data, is a fundamental NLP challenge. We introduce a neural network model that marries together ideas from two prominent strands of research on domain adaptation through representation learning: structural correspondence learning (SCL, (Blitzer et al., 2006)) and autoencoder neural networks. Particularly, our model is a three-layer neural network that learns to encode the nonpivot features of an input example into a low-dimensional representation, so that the existence of pivot features (features that are prominent in both domains and convey useful information for the NLP task) in the example can be decoded from that representation. The low-dimensional representation is then employed in a learning algorithm for the task. Moreover, we show how to inject pre-trained word embeddings into our model in order to improve generalization across examples with similar pivot features. On the task of cross-domain product sentiment classification (Blitzer et al., 2007), consisting of 12 domain pairs, our model outperforms both the SCL and the marginalized stacked denoising autoencoder (MSDA, (Chen et al., 2012)) methods by 3.77% and 2.17% respectively, on average across domain pairs.