CL LGOct 5, 2021

Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets

arXiv:2110.02200v10.2

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of building robust public sentiment APIs with limited annotation resources, but it is incremental as it applies an existing pseudolabeling technique to a specific domain adaptation problem.

The paper tackles the problem of training a sentiment classifier that generalizes across domains with limited annotated data, showing that using pseudolabels on unannotated data from multiple domains improves cross-dataset generalization.

The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.

View on arXiv PDF

Similar