CLAug 3, 2022

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

arXiv:2208.01875v16 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This addresses a domain-specific problem for researchers and practitioners in computational linguistics and Jewish studies by providing a tailored tool for processing Rabbinic Hebrew texts.

The authors tackled the lack of pre-trained language models for Rabbinic Hebrew by introducing BEREL, a model trained specifically on Rabbinic texts, and demonstrated its superiority over existing models on a homograph challenge set.

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, AlephBert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographical, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We release the new model and homograph challenge set for unrestricted use.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes