CLMay 23, 2024

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs

arXiv:2405.14862v26 citationsh-index: 26EMNLP
Originality Highly original
AI Analysis

This addresses a bottleneck in decoder-only LLMs for users needing enhanced reasoning and understanding, though it is incremental as it builds on existing models.

The authors tackled the limited expressiveness of decoder-only LLMs due to unidirectional attention by proposing Bitune, which incorporates bidirectional attention into prompt processing, resulting in significant performance improvements on commonsense reasoning, arithmetic, and language understanding tasks.

Decoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameter-efficient finetuning techniques and full model finetuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes