LGAIJul 6, 2022

Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

arXiv:2207.02964v12 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the problem of data labeling inefficiency for machine learning practitioners, but it appears incremental as it builds on existing active learning methods.

The paper tackles the problem of labeled data shortage by proposing a clustering-based active learning framework called ALCS, which uses density-based clustering and a bi-cluster boundary query procedure to improve classification performance for overlapped classes, with experimental results justifying its efficacy.

In this paper, we proposed a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), to address the shortage of labeled data. ALCS employs a density-based clustering approach to explore the cluster structure from the data without requiring exhaustive parameter tuning. A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our experimental results justified the efficacy of the ALCS approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes