IRLGMLAug 28, 2018

Distance Based Source Domain Selection for Sentiment Classification

arXiv:1808.09271v19 citations
Originality Incremental advance
AI Analysis

This addresses domain adaptation challenges in sentiment classification for applications like social media analysis, but it is incremental as it builds on existing distance-based methods.

The paper tackles the problem of sentiment classification on unseen domains with few labeled samples by proposing a method to select an optimal source domain from candidates using distance functions, resulting in significant improvement in cross-domain classification error compared to random selection.

Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a predictive measure, which allows us to select an optimal source domain from a set of candidates. The proposed measure is a linear combination of well-known distance functions between probability distributions supported on the source and target domains (e.g. Earth Mover's distance and Kullback-Leibler divergence). The performance of the proposed methodology is validated through an SC case study in which our numerical experiments suggest a significant improvement in the cross domain classification error in comparison with a random selected source domain for both a naive and adaptive learning setting. In the case of more heterogeneous datasets, the predictability feature of the proposed model can be utilized to further select a subset of candidate domains, where the corresponding classifier outperforms the one trained on all available source domains. This observation reinforces a hypothesis that our proposed model may also be deployed as a means to filter out redundant information during a training phase of SC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes