The GELATO Dataset for Legislative NER
This work addresses the need for specialized NER tools in legislative analysis, though it is incremental as it builds on existing transformer and LLM methods.
The authors tackled the problem of named entity recognition in U.S. legislative texts by introducing the GELATO dataset, which includes a novel two-level ontology and fine-tuned transformer models, with RoBERTa showing strong performance and BERT weaker results.
This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the 118th Congress annotated using a novel two-level named entity recognition ontology designed for U.S. legislative texts. We fine-tune transformer-based models (BERT, RoBERTa) of different architectures and sizes on this dataset for first-level prediction. We then use LLMs with optimized prompts to complete the second level prediction. The strong performance of RoBERTa and relatively weak performance of BERT models, as well as the application of LLMs as second-level predictors, support future research in legislative NER or downstream tasks using these model combinations as extraction tools.