AINov 2, 2017

Interpretable and Pedagogical Examples

Smitha Milli, Pieter Abbeel, Igor Mordatch

arXiv:1711.00694v216.920 citations

Originality Incremental advance

AI Analysis

This addresses the issue of interpretability in AI teaching systems, which is incremental as it builds on existing neural network teaching methods.

The paper tackled the problem of neural network teachers generating uninterpretable examples for student networks, showing that iterative training can produce interpretable teaching strategies. They demonstrated this by evaluating similarity to intuitive strategies and human teaching effectiveness across rule-based, probabilistic, boolean, and hierarchical concepts.

Teachers intentionally pick the most informative examples to show their students. However, if the teacher and student are neural networks, the examples that the teacher network learns to give, although effective at teaching the student, are typically uninterpretable. We show that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies. We evaluate interpretability by (1) measuring the similarity of the teacher's emergent strategies to intuitive strategies in each domain and (2) conducting human experiments to evaluate how effective the teacher's strategies are at teaching humans. We show that the teacher network learns to select or generate interpretable, pedagogical examples to teach rule-based, probabilistic, boolean, and hierarchical concepts.

View on arXiv PDF

Similar