CLMar 15, 2022

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Maximillian Chen, Caitlyn Chen, Xiao Yu, Zhou Yu

arXiv:2203.08299v423.4271 citationsh-index: 44Has Code

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in natural language processing for researchers and practitioners needing efficient syntactic similarity analysis, though it is incremental as it builds on existing tree kernel methods.

The authors tackled the problem of inefficient and inconsistent syntactic similarity metrics by developing FastKASSIM, which uses tree kernels to pair and average constituency parse trees, resulting in up to 5.32 times faster computation and improved robustness over a predecessor on the r/ChangeMyView corpus.

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

View on arXiv PDF Code

Similar