Conditional Probability Tree Estimation Analysis and Algorithms
This work addresses efficient conditional probability estimation for large label sets, presenting a novel online algorithm with theoretical guarantees, though it appears incremental as it builds on tree-based reductions.
The paper tackles the problem of estimating conditional probabilities of labels in O(log n) time by reducing it to binary regression problems organized in a tree structure, proving a regret bound scaling with tree depth, and empirically testing an online algorithm that constructs a logarithmic depth tree, achieving success on a dataset with about 10^6 labels.
We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels.