CLJun 7, 2024

BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

arXiv:2406.04947v126 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing creative reasoning in language models for NLP researchers, but it is incremental as it applies existing techniques to a new benchmark.

The paper tackled the BRAINTEASER task, which evaluates language models' creative thinking on multi-choice questions, and achieved an overall accuracy of 85% on the sentence puzzles subtask using methods like fine-tuning and consensus generation.

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think "outside of the box". We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a "round table conference" approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes