Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities
This work addresses a key challenge in modern neuroscience by providing a method to quantify uncertainty in neuron type classification for the Drosophila larva mushroom body, though it appears incremental as it adapts existing forest-based approaches to a known bottleneck.
The authors tackled the problem of estimating information-theoretic quantities like conditional entropy and mutual information in high-dimensional or multi-scale data, where existing nearest neighbor methods fail, by proposing decision forest-based adaptive nearest neighbor estimators that effectively estimate these quantities, as demonstrated in a real-world connectome application.
Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees in certain simple scenarios. However, existing approaches fail in high-dimensional settings and when different features are measured on different scales.We propose decision forest-based adaptive nearest neighbor estimators and show that they are able to effectively estimate posterior probabilities, conditional entropies, and mutual information even in the aforementioned settings.We provide an extensive study of efficacy for classification and posterior probability estimation, and prove certain forest-based approaches to be consistent estimators of the true posteriors and derived information-theoretic quantities under certain assumptions. In a real-world connectome application, we quantify the uncertainty about neuron type given various cellular features in the Drosophila larva mushroom body, a key challenge for modern neuroscience.