Mevaker: Conclusion Extraction and Allocation Resources for the Hebrew Language
This addresses a domain-specific need for Hebrew natural language processing tools in government document analysis.
The authors tackled the problem of conclusion extraction and allocation for Hebrew language documents by creating new datasets from Israeli State Comptroller reports and developing models for these tasks, with all resources made publicly available.
In this paper, we introduce summarization MevakerSumm and conclusion extraction MevakerConc datasets for the Hebrew language based on the State Comptroller and Ombudsman of Israel reports, along with two auxiliary datasets. We accompany these datasets with models for conclusion extraction (HeConE, HeConEspc) and conclusion allocation (HeCross). All of the code, datasets, and model checkpoints used in this work are publicly available.