CLAILGApr 19, 2021

skweak: Weak Supervision Made Easy for NLP

arXiv:2104.09683v1719 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the problem of reducing manual labeling effort for NLP practitioners, though it is incremental as it builds on existing weak supervision methods.

The authors introduced skweak, a Python toolkit that enables NLP developers to apply weak supervision to tasks like NER and sentiment analysis by using labeling functions and generative models for automatic annotation, resulting in a versatile open-source tool.

We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of labelling data points by hand, we use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset. The resulting labels are then aggregated with a generative model that estimates the accuracy (and possible confusions) of each labelling function. The skweak toolkit makes it easy to implement a large spectrum of labelling functions (such as heuristics, gazetteers, neural models or linguistic constraints) on text data, apply them on a corpus, and aggregate their results in a fully unsupervised fashion. skweak is especially designed to facilitate the use of weak supervision for NLP tasks such as text classification and sequence labelling. We illustrate the use of skweak for NER and sentiment analysis. skweak is released under an open-source license and is available at: https://github.com/NorskRegnesentral/skweak

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes