CLApr 5, 2017

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

arXiv:1704.01346v120 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of detecting semantic similarity across languages for researchers and practitioners in NLP, but it is incremental as it builds on existing methods for a specific task.

The paper tackled cross-language plagiarism detection for semantic textual similarity by developing multiple methods and their combinations, achieving first place in a competition with an 83.02% correlation to human annotations.

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes