ASIRNov 27, 2018

Large-scale Speaker Retrieval on Random Speaker Variability Subspace

arXiv:1811.10812v2
Originality Incremental advance
AI Analysis

This addresses the need for efficient voice identity search in large datasets, representing an incremental improvement over existing LSH methods.

The paper tackles the problem of fast speaker retrieval in large-scale data by proposing a Random Speaker-variability Subspace (RSS) projection for Locality Sensitive Hashing, resulting in 100 times faster retrieval than linear search and 7 times faster than standard LSH.

This paper describes a fast speaker search system to retrieve segments of the same voice identity in the large-scale data. A recent study shows that Locality Sensitive Hashing (LSH) enables quick retrieval of a relevant voice in the large-scale data in conjunction with i-vector while maintaining accuracy. In this paper, we proposed Random Speaker-variability Subspace (RSS) projection to map a data into LSH based hash tables. We hypothesized that rather than projecting on completely random subspace without considering data, projecting on randomly generated speaker variability space would give more chance to put the same speaker representation into the same hash bins, so we can use less number of hash tables. Multiple RSS can be generated by randomly selecting a subset of speakers from a large speaker cohort. From the experimental result, the proposed approach shows 100 times and 7 times faster than the linear search and LSH, respectively

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes