CL AISep 28, 2023

Stress Testing Chain-of-Thought Prompting for Large Language Models

arXiv:2309.16621v10.94 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work provides insights into the robustness of CoT prompting for improving reasoning in large language models, though it is incremental, building on prior research.

The study analyzed how perturbations in Chain-of-Thought (CoT) prompting affect the reasoning performance of GPT-3, finding that incorrect CoT values significantly reduce accuracy, while errors in operators or order have less impact.

This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.

View on arXiv PDF

Similar