BERT-LSH: Reducing Absolute Compute For Attention
This addresses the problem of high computational costs in transformer models like BERT for researchers and practitioners, offering a novel method that is incremental in combining LSH with an existing architecture.
The study tackled the computational inefficiency of BERT's attention mechanism by introducing BERT-LSH, which uses Locality Sensitive Hashing to approximate attention, resulting in significantly reduced computational demand and unexpected performance improvements over the baseline in pretraining and fine-tuning tasks.
This study introduces a novel BERT-LSH model that incorporates Locality Sensitive Hashing (LSH) to approximate the attention mechanism in the BERT architecture. We examine the computational efficiency and performance of this model compared to a standard baseline BERT model. Our findings reveal that BERT-LSH significantly reduces computational demand for the self-attention layer while unexpectedly outperforming the baseline model in pretraining and fine-tuning tasks. These results suggest that the LSH-based attention mechanism not only offers computational advantages but also may enhance the model's ability to generalize from its training data. For more information, visit our GitHub repository: https://github.com/leo4life2/algoml-final