CLMar 15, 2022

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

arXiv:2203.08299v4271 citationsh-index: 44
AI Analysis

This work addresses a bottleneck in natural language processing for researchers and practitioners needing efficient syntactic similarity analysis, though it is incremental as it builds on existing tree kernel methods.

The authors tackled the problem of inefficient and inconsistent syntactic similarity metrics by developing FastKASSIM, which uses tree kernels to pair and average constituency parse trees, resulting in up to 5.32 times faster computation and improved robustness over a predecessor on the r/ChangeMyView corpus.

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes