Refining Syntactic Distinctions Using Decision Trees: A Paper on Postnominal 'That' in Complement vs. Relative Clauses
This work addresses a specific linguistic parsing challenge for NLP researchers, but it is incremental as it builds on existing models and datasets.
The study tackled the problem of distinguishing between 'that' as a relative pronoun and complementizer in English clauses by retraining the TreeTagger model, resulting in improved accuracy through fine-tuning and dataset size adjustments.
In this study, we first tested the performance of the TreeTagger English model developed by Helmut Schmid with test files at our disposal, using this model to analyze relative clauses and noun complement clauses in English. We distinguished between the two uses of "that," both as a relative pronoun and as a complementizer. To achieve this, we employed an algorithm to reannotate a corpus that had originally been parsed using the Universal Dependency framework with the EWT Treebank. In the next phase, we proposed an improved model by retraining TreeTagger and compared the newly trained model with Schmid's baseline model. This process allowed us to fine-tune the model's performance to more accurately capture the subtle distinctions in the use of "that" as a complementizer and as a nominal. We also examined the impact of varying the training dataset size on TreeTagger's accuracy and assessed the representativeness of the EWT Treebank files for the structures under investigation. Additionally, we analyzed some of the linguistic and structural factors influencing the ability to effectively learn this distinction.