CLJul 29, 2024

Segmentation en phrases : ouvrez les guillemets sans perdre le fil

arXiv:2407.19808v1h-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses a specific NLP task for text processing, but it is incremental as it builds on existing methods and datasets.

The paper tackles sentence segmentation in XML documents, particularly handling nested structures like quotations and parentheses, and reports performance improvements over 2019 benchmarks on the same dataset.

This paper presents a graph cascade for sentence segmentation of XML documents. Our proposal offers sentences inside sentences for cases introduced by quotation marks and hyphens, and also pays particular attention to situations involving incises introduced by parentheses and lists introduced by colons. We present how the tool works and compare the results obtained with those available in 2019 on the same dataset, together with an evaluation of the system's performance on a test corpus

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes