CL IR LGDec 5, 2022

Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal "that" in Noun Complement Clauses vs. Relative Clauses

arXiv:2212.02591v123.8286 citationsh-index: 9Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a subtle parsing distinction in computational linguistics, which is incremental as it builds on existing methods and corpora for a specific linguistic problem.

The paper tackled the problem of distinguishing between noun complement clauses and relative clauses with postnominal 'that' in English parsing, using a probabilistic decision tree (TreeTagger) to learn this distinction and evaluating the impact of training set size and corpus representativeness, achieving results that highlight the learnability of the distinction but without providing concrete accuracy numbers.

In this paper we investigated two different methods to parse relative and noun complement clauses in English and resorted to distinct tags for their corresponding that as a relative pronoun and as a complementizer. We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency. Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that". We investigated the effect of the training set size on TreeTagger accuracy and how representative the GUM Treebank files are for the two structures under scrutiny. We discussed some of the linguistic and structural tenets of the learnability of this distinction.

View on arXiv PDF Code

Similar