CLNov 7, 2024

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Ekaterina Artemova, Akim Tsvigun, Dominik Schlechtweg, Natalia Fedorova, Konstantin Chernyshev, Sergei Tilga, Boris Obmoroshev

arXiv:2411.04637v31.96 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This is an incremental tutorial for NLP practitioners in research and industry to optimize data labeling projects.

This tutorial addresses the problem of expensive and time-consuming human data labeling for machine learning by presenting strategies like synthetic data generation, active learning, and hybrid labeling to speed up annotation and reduce costs, with a focus on practical applications through case studies and a hands-on workshop.

Training and deploying machine learning models relies on a large amount of human-annotated data. As human labeling becomes increasingly expensive and time-consuming, recent research has developed multiple strategies to speed up annotation and reduce costs and human workload: generating synthetic training data, active learning, and hybrid labeling. This tutorial is oriented toward practical applications: we will present the basics of each strategy, highlight their benefits and limitations, and discuss in detail real-life case studies. Additionally, we will walk through best practices for managing human annotators and controlling the quality of the final dataset. The tutorial includes a hands-on workshop, where attendees will be guided in implementing a hybrid annotation setup. This tutorial is designed for NLP practitioners from both research and industry backgrounds who are involved in or interested in optimizing data labeling projects.

View on arXiv PDF

Similar