Neighborhood density estimation using space-partitioning based hashing schemes
This work addresses problems in bioinformatics for single-cell data analysis and machine learning for streaming data, presenting incremental improvements with novel methods.
The paper tackled anomaly detection in large-scale single-cell RNA sequencing data and concept drift detection in streaming data, with FiRE/FiRE.1 showing superior performance against state-of-the-art techniques and Enhash proving highly competitive in time and accuracy across various drift types.
This work introduces FiRE/FiRE.1, a novel sketching-based algorithm for anomaly detection to quickly identify rare cell sub-populations in large-scale single-cell RNA sequencing data. This method demonstrated superior performance against state-of-the-art techniques. Furthermore, the thesis proposes Enhash, a fast and resource-efficient ensemble learner that uses projection hashing to detect concept drift in streaming data, proving highly competitive in time and accuracy across various drift types.