CLAIJun 11, 2024

Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values

arXiv:2406.10267v23 citations
Originality Incremental advance
AI Analysis

This work addresses decoding inefficiencies in generative LLMs for researchers and practitioners, offering incremental improvements in reading comprehension tasks.

The paper tackles the problem of improving LLM reading comprehension by manipulating token probability distributions, showing that using expected values over next token distributions with high-temperature scaling boosts performance on the SummEval dataset, with correlations to human judgment improving from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral.

LLM text decoding is key component for perceived LLM quality. We demonstrate two experiments showing that decoding methods could be improved by manipulation of token probabilities. First, we test few LLM on SummEval summary scoring dataset, to measure reading comprehension. We compare scores from greedy decoding to expected values over the next token distribution. We scale logits by large temperature to increase the entropy of scores. This allows strong improvement of performance on SummEval (in terms of correlations to human judgement). We see improvement from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral, beating GPT 4 0314 result on two metrics. Part of the gain seems related to positional bias. Secondly, we use probability-based tree sampling algorithm, to examine all most probable generations for given prompt.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes