CLAILGFeb 4, 2024

Absolute convergence and error thresholds in non-active adaptive sampling

arXiv:2402.02522v19 citationsh-index: 12Journal of computer and system sciences (Print)
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of automated sample size determination in machine learning for practitioners, though it appears incremental as it builds on existing non-active adaptive sampling frameworks.

The paper tackles the problem of determining when to stop sampling in non-active adaptive learning by proposing a method to calculate absolute convergence and error thresholds, enabling identification of when model quality plateaus and estimating proximity to that goal, with tests in natural language processing for part-of-speech taggers.

Non-active adaptive sampling is a way of building machine learning models from a training data base which are supposed to dynamically and automatically derive guaranteed sample size. In this context and regardless of the strategy used in both scheduling and generating of weak predictors, a proposal for calculating absolute convergence and error thresholds is described. We not only make it possible to establish when the quality of the model no longer increases, but also supplies a proximity condition to estimate in absolute terms how close it is to achieving such a goal, thus supporting decision making for fine-tuning learning parameters in model selection. The technique proves its correctness and completeness with respect to our working hypotheses, in addition to strengthening the robustness of the sampling scheme. Tests meet our expectations and illustrate the proposal in the domain of natural language processing, taking the generation of part-of-speech taggers as case study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes