Fine-Tuning BERTs for Definition Extraction from Mathematical Text
This work addresses definition extraction for mathematical text processing, but it is incremental as it applies existing fine-tuning methods to new datasets.
The paper tackled the problem of extracting definitions from mathematical text by fine-tuning BERT models for binary classification, achieving comparable results to earlier models with less computational effort, as measured by accuracy, recall, and precision metrics.
In this paper, we fine-tuned three pre-trained BERT models on the task of "definition extraction" from mathematical English written in LaTeX. This is presented as a binary classification problem, where either a sentence contains a definition of a mathematical term or it does not. We used two original data sets, "Chicago" and "TAC," to fine-tune and test these models. We also tested on WFMALL, a dataset presented by Vanetik and Litvak in 2021 and compared the performance of our models to theirs. We found that a high-performance Sentence-BERT transformer model performed best based on overall accuracy, recall, and precision metrics, achieving comparable results to the earlier models with less computational effort.