LGHCSep 25, 2019

Teacher-Student Learning Paradigm for Tri-training: An Efficient Method for Unlabeled Data Exploitation

arXiv:1909.11233v16 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited labeled data in NLP tasks like sentiment analysis, offering an incremental improvement to existing tri-training methods.

The paper tackles the problem of exploiting unlabeled data in semi-supervised learning by introducing a teacher-student paradigm for tri-training, which improves label quality and control. It shows that the method outperforms other baselines on a sentiment analysis task, requiring fewer labeled samples.

Given that labeled data is expensive to obtain in real-world scenarios, many semi-supervised algorithms have explored the task of exploitation of unlabeled data. Traditional tri-training algorithm and tri-training with disagreement have shown promise in tasks where labeled data is limited. In this work, we introduce a new paradigm for tri-training, mimicking the real world teacher-student learning process. We show that the adaptive teacher-student thresholds used in the proposed method provide more control over the learning process with higher label quality. We perform evaluation on SemEval sentiment analysis task and provide comprehensive comparisons over experimental settings containing varied labeled versus unlabeled data rates. Experimental results show that our method outperforms other strong semi-supervised baselines, while requiring less number of labeled training samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes