CLAIIRLGApr 3, 2024

uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers?

arXiv:2404.02474v127 citationsh-index: 18SemEval
AI Analysis

This work addresses the challenge of improving AI's creative problem-solving for NLP applications, but it is incremental as it builds on an existing benchmark and methods.

This paper tackled the problem of enhancing large language models' lateral thinking abilities on a sentence puzzle benchmark, finding that compressed informative prompts and dynamic in-context learning significantly improved performance, with fine-tuning on generated data also boosting results on other commonsense datasets.

Inspired by human cognition, Jiang et al.(2023c) create a benchmark for assessing LLMs' lateral thinking-thinking outside the box. Building upon this benchmark, we investigate how different prompting methods enhance LLMs' performance on this task to reveal their inherent power for outside-the-box thinking ability. Through participating in SemEval-2024, task 9, Sentence Puzzle sub-task, we explore prompt engineering methods: chain of thoughts (CoT) and direct prompting, enhancing with informative descriptions, and employing contextualizing prompts using a retrieval augmented generation (RAG) pipeline. Our experiments involve three LLMs including GPT-3.5, GPT-4, and Zephyr-7B-beta. We generate a dataset of thinking paths between riddles and options using GPT-4, validated by humans for quality. Findings indicate that compressed informative prompts enhance performance. Dynamic in-context learning enhances model performance significantly. Furthermore, fine-tuning Zephyr on our dataset enhances performance across other commonsense datasets, underscoring the value of innovative thinking.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes