CLAIJul 29, 2024

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

arXiv:2407.19825v2104 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses efficiency and clarity issues for users of LLMs in question-answering, though it is incremental as it builds on existing prompt engineering techniques.

The paper tackles the problem of excessively verbose outputs from large language models (LLMs) in reasoning tasks by analyzing the impact of output length on correctness and cost, introducing metrics for correct conciseness and a Constrained-CoT prompting strategy that improves conciseness across models and datasets.

Today's large language models (LLMs) can solve challenging question-answering tasks, and prompt engineering techniques, such as chain-of-thought (CoT), have gained attention for enhancing the explanation and correctness of outputs. However, many models and techniques tend to produce excessively verbose and lengthy answers, leading to issues with both conciseness and generation time. To address this, this paper analyzes the impact of output lengths on LLM inference pipelines by introducing and proposing novel metrics to evaluate the \textit{correct conciseness} of a model and related prompting techniques. Then, we examine the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT (CCoT), which encourages the model to produce more concise outputs. To better understand the effects of such a prompt, we also introduce two additional scores for analyzing the conciseness, measured in terms of redundancy and information flow in generated answers. Experiments on pretrained LLMs and multiple datasets demonstrate the benefits of the proposed metrics and the effectiveness of CCoT across different models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes