LeQua@CLEF2022: Learning to Quantify
This provides a standardized benchmark for researchers in learning to quantify, but it is incremental as it builds on existing evaluation frameworks.
The paper introduces LeQua 2022, a lab for evaluating methods that predict class frequencies in unlabeled text datasets, addressing the suboptimality of traditional classification-based approaches.
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form.