Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
This work addresses the need for better natural language processing tools for Hebrew parliamentary data, but it is incremental as it builds on an existing architecture with domain-specific fine-tuning.
The researchers tackled the problem of understanding parliamentary language in Hebrew by fine-tuning a large language model on Israeli parliamentary proceedings, resulting in significant improvements in perplexity and accuracy over the baseline model.
We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understanding parliamentary language according to the MLM task. We provide a detailed evaluation of the model's performance, showing improvements in perplexity and accuracy over the baseline DictaBERT model.