CLAug 3, 2022

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel

arXiv:2208.01875v10.86 citationsh-index: 39

Originality Synthesis-oriented

AI Analysis

This addresses a domain-specific problem for researchers and practitioners in computational linguistics and Jewish studies by providing a tailored tool for processing Rabbinic Hebrew texts.

The authors tackled the lack of pre-trained language models for Rabbinic Hebrew by introducing BEREL, a model trained specifically on Rabbinic texts, and demonstrated its superiority over existing models on a homograph challenge set.

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, AlephBert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographical, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We release the new model and homograph challenge set for unrestricted use.

View on arXiv PDF

Similar