CLOct 28, 2019

Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

Shrey Desai, Barea Sinno, Alex Rosenfeld, Junyi Jessy Li

arXiv:1910.12698v130.11000 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge for political science researchers in extracting relevant documents from unlabeled corpora, though it appears incremental as it builds on existing domain adaptation methods.

The paper tackles the problem of analyzing political documents from large unlabeled corpora by developing an unsupervised domain adaptation framework with adaptive ensembling, which outperforms benchmarks on an expert-annotated dataset.

Insightful findings in political science often require researchers to analyze documents of a certain subject or type, yet these documents are usually contained in large corpora that do not distinguish between pertinent and non-pertinent documents. In contrast, we can find corpora that label relevant documents but have limitations (e.g., from a single source or era), preventing their use for political science research. To bridge this gap, we present \textit{adaptive ensembling}, an unsupervised domain adaptation framework, equipped with a novel text classification model and time-aware training to ensure our methods work well with diachronic corpora. Experiments on an expert-annotated dataset show that our framework outperforms strong benchmarks. Further analysis indicates that our methods are more stable, learn better representations, and extract cleaner corpora for fine-grained analysis.

View on arXiv PDF Code

Similar