CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
This work addresses the problem of detecting semantic similarity across languages for researchers and practitioners in NLP, but it is incremental as it builds on existing methods for a specific task.
The paper tackled cross-language plagiarism detection for semantic textual similarity by developing multiple methods and their combinations, achieving first place in a competition with an 83.02% correlation to human annotations.
We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.