MLDSIRLGMar 30, 2018

Engineering a Simplified 0-Bit Consistent Weighted Sampling

arXiv:1804.00069v23 citations
Originality Incremental advance
AI Analysis

This provides a faster alternative for data analysis and classification tasks using weighted sampling, though it is incremental as it builds directly on the existing ICWS algorithm.

The authors tackled the computational burden of the ICWS algorithm for Min-Hashing on real-valued datasets by developing a simplified approach, achieving over 20x speedups while maintaining the same quality of results.

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art performance for this problem space. However, ICWS suffers a computational burden as the sketch size K increases. We develop a new Simplified approach to the ICWS algorithm, that enables us to obtain over 20x speedups compared to the standard algorithm. The veracity of our approach is demonstrated empirically on multiple datasets and scenarios, showing that our new Simplified CWS obtains the same quality of results while being an order of magnitude faster.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes