Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets
This addresses the challenge of building robust public sentiment APIs with limited annotation resources, but it is incremental as it applies an existing pseudolabeling technique to a specific domain adaptation problem.
The paper tackles the problem of training a sentiment classifier that generalizes across domains with limited annotated data, showing that using pseudolabels on unannotated data from multiple domains improves cross-dataset generalization.
The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.