Black-box Generalization of Machine Teaching
This work addresses active learning efficiency for machine learning practitioners by providing incremental improvements to theoretical bounds.
The paper tackles the problem of hypothesis-pruning in active learning by introducing a black-box teaching hypothesis with a tighter slack term, which theoretically reduces generalization error from R(h*)+4Δ_{T-1} to approximately R(h^T)+2Δ_{T-1} and label complexity from 4θ(TR(h*)+2O(√T)) to approximately 2θ(2TR(h^T)+3O(√T)).
Hypothesis-pruning maximizes the hypothesis updates for active learning to find those desired unlabeled data. An inherent assumption is that this learning manner can derive those updates into the optimal hypothesis. However, its convergence may not be guaranteed well if those incremental updates are negative and disordered. In this paper, we introduce a black-box teaching hypothesis $h^\mathcal{T}$ employing a tighter slack term $\left(1+\mathcal{F}^{\mathcal{T}}(\widehat{h}_t)\right)Δ_t$ to replace the typical $2Δ_t$ for pruning. Theoretically, we prove that, under the guidance of this teaching hypothesis, the learner can converge into a tighter generalization error and label complexity bound than those non-educated learners who do not receive any guidance from a teacher:1) the generalization error upper bound can be reduced from $R(h^*)+4Δ_{T-1}$ to approximately $R(h^{\mathcal{T}})+2Δ_{T-1}$, and 2) the label complexity upper bound can be decreased from $4 θ\left(TR(h^{*})+2O(\sqrt{T})\right)$ to approximately $2θ\left(2TR(h^{\mathcal{T}})+3 O(\sqrt{T})\right)$. To be strict with our assumption, self-improvement of teaching is firstly proposed when $h^\mathcal{T}$ loosely approximates $h^*$. Against learning, we further consider two teaching scenarios: teaching a white-box and black-box learner. Experiments verify this idea and show better generalization performance than the fundamental active learning strategies, such as IWAL, IWAL-D, etc.