Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
This addresses a bottleneck in decoder-only LLMs for users needing enhanced reasoning and understanding, though it is incremental as it builds on existing models.
The authors tackled the limited expressiveness of decoder-only LLMs due to unidirectional attention by proposing Bitune, which incorporates bidirectional attention into prompt processing, resulting in significant performance improvements on commonsense reasoning, arithmetic, and language understanding tasks.
Decoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameter-efficient finetuning techniques and full model finetuning.