DSDBMLMay 12, 2021

How to Design Robust Algorithms using Noisy Comparison Oracle

arXiv:2105.05782v116 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust clustering for applications where exact distance computations are infeasible, offering incremental improvements in handling noise in comparison-based methods.

The paper tackles the problem of performing clustering operations like k-center and hierarchical clustering when exact distances are unavailable, by using a noisy comparison oracle that answers relative distance queries; it introduces adversarial and probabilistic noise models, provides robust algorithms with proven approximation guarantees and query complexity, and validates them empirically on real-world datasets.

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as $k$-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point u closer to v or w closer to x?'. However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search under these noise models. Building upon the techniques we develop for these comparison operations, we give robust algorithms for k-center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes