Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
This work addresses a practical bottleneck for deploying efficient LLMs in resource-constrained scenarios, though it is incremental as it builds on existing decoding methods.
The paper tackles the problem of improving generative quality in small-scale large language models (LLMs) under limited supervision, where only a few tokens can be generated by LLMs, by developing an algorithm that adaptively trusts or disregards LLM predictions based on small-scale LLM confidence, resulting in consistent improvements over conventional decoding strategies across various models and datasets.
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM supervisions one can use, giving birth to many decoding algorithms that utilize supervision without further training. However, it is still unclear what is an effective strategy under the $\textit{limited supervision}$ scenario, where we assume that no more than a few tokens can be generated by LLMs. To this end, we develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens so that the generated tokens can more accurately condition the subsequent token generation by small-scale LLM only. Critically, we find that it is essential to adaptively overtrust or disregard the LLM prediction based on the confidence of the small-scale LLM. Through our experiments on a wide range of models and datasets, we demonstrate that our method provides a consistent improvement over conventional decoding strategies. $\small$ $\textbf{Code:}$ https://github.com/HJ-Ok/DecLimSup