CLJun 2, 2025

Novel Benchmark for NER in the Wastewater and Stormwater Domain

arXiv:2506.01938v1h-index: 17CIST
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of domain-specific information extraction for wastewater management professionals, but it is incremental as it focuses on creating a benchmark and baseline.

The study tackled the challenge of extracting structured knowledge from wastewater and stormwater management texts by developing a French-Italian domain-specific corpus for Named Entity Recognition (NER), evaluating state-of-the-art methods including LLM-based approaches to establish a reliable baseline.

Effective wastewater and stormwater management is essential for urban sustainability and environmental protection. Extracting structured knowledge from reports and regulations is challenging due to domainspecific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes