CLNov 13, 2018

Cross-lingual Short-text Matching with Deep Learning

arXiv:1811.05569v11 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of identifying semantically similar questions across languages for applications like question answering sites and chatbots, but it is incremental as it builds on existing deep learning advances.

The paper tackled cross-lingual short-text matching for question pairs, achieving log-loss scores of 0.35 and 0.39 in a contest and ranking 7th out of 1027 teams.

The problem of short text matching is formulated as follows: given a pair of sentences or questions, a matching model determines whether the input pair mean the same or not. Models that can automatically identify questions with the same meaning have a wide range of applications in question answering sites and modern chatbots. In this article, we describe the approach by team hahu to solve this problem in the context of the "CIKM AnalytiCup 2018 - Cross-lingual Short-text Matching of Question Pairs" that is sponsored by Alibaba. Our solution is an end-to-end system based on current advances in deep learning which avoids heavy feature-engineering and achieves improved performance over traditional machine-learning approaches. The log-loss scores for the first and second rounds of the contest are 0.35 and 0.39 respectively. The team was ranked 7th from 1027 teams in the overall ranking scheme by the organizers that consisted of the two contest scores as well as: innovation and system integrity, understanding data as well as practicality of the solution for business.

View on arXiv PDF

Similar