CLNov 12, 2023

Large Language Models are In-context Teachers for Knowledge Reasoning

Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu

arXiv:2311.06985v38.324 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient knowledge reasoning for LLM users by reducing reliance on human experts, though it is incremental as it builds on existing in-context learning methods.

The paper tackles the problem of costly and variable human-crafted in-context demonstrations for teaching reasoning, showing that using an LLM's self-elicited explanations as in-context examples significantly outperforms human-crafted ones, with a 5% accuracy improvement on medical QA.

In this work, we study in-context teaching (ICT), where a teacher provides in-context example rationales to teach a student to reason over unseen cases. Human teachers are usually required to craft in-context demonstrations, which are costly and have high variance. We ask whether a large language model (LLM) can serve as a more effective in-context teacher for itself or other LLMs, compared to humans. Inspired by the Encoding Specificity Hypothesis from human episodic memory, we hypothesize that in-context exemplars crafted by the teacher should match the training data of the student. This hypothesis motivates us to propose Self-Explain where an LLM's self-elicited explanations are used as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other baselines. Furthermore, we reveal that for ICT, rationales from different teacher LLMs or human experts that more resemble the student LLM's self-explanations are better in-context demonstrations. This supports our encoding specificity hypothesis. We then propose Teach-Back that aligns a teacher LLM with the student to enhance the ICT performance. For example, Teach-Back enables a 7B model to teach the much larger GPT-3.5 in context, surpassing human teachers by around 5% in test accuracy on medical question answering.

View on arXiv PDF

Similar