Filtering out mislabeled training instances using black-box optimization and quantum annealing

arXiv:2501.06916v22 citationsh-index: 2Sci Rep
AI Analysis

This addresses dataset quality issues in supervised learning, though it appears incremental as it builds on existing noise-removal strategies with quantum annealing integration.

The study tackled the problem of mislabeled training instances degrading model generalization by proposing a method that combines surrogate model-based black-box optimization with quantum annealing to filter out high-risk mislabeled instances. Experiments on a noisy majority bit task showed that using D-Wave's physical quantum annealer achieved faster optimization and higher-quality training subsets compared to simulated alternatives.

This study proposes an approach for removing mislabeled instances from contaminated training datasets by combining surrogate model-based black-box optimization (BBO) with postprocessing and quantum annealing. Mislabeled training instances, a common issue in real-world datasets, often degrade model generalization, necessitating robust and efficient noise-removal strategies. The proposed method evaluates filtered training subsets based on validation loss, iteratively refines loss estimates through surrogate model-based BBO with postprocessing, and leverages quantum annealing to efficiently sample diverse training subsets with low validation error. Experiments on a noisy majority bit task demonstrate the method's ability to prioritize the removal of high-risk mislabeled instances. Integrating D-Wave's clique sampler running on a physical quantum annealer achieves faster optimization and higher-quality training subsets compared to OpenJij's simulated quantum annealing sampler or Neal's simulated annealing sampler, offering a scalable framework for enhancing dataset quality. This work highlights the effectiveness of the proposed method for supervised learning tasks, with future directions including its application to unsupervised learning, real-world datasets, and large-scale implementations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes