CLApr 13, 2020

Integrated Eojeol Embedding for Erroneous Sentence Classification in Korean Chatbots

arXiv:2004.05744v11 citations
AI Analysis

This addresses a specific issue for Korean chatbot developers by enhancing error handling, but it is incremental as it builds on existing classification methods.

The paper tackles the problem of Korean sentence classification in chatbots when input sentences contain spelling or spacing errors, which disrupt morphological analysis and tokenization, and proposes an Integrated Eojeol Embedding approach that improves classification accuracy by 17 percentage points over the baseline.

This paper attempts to analyze the Korean sentence classification system for a chatbot. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separated by space) Embedding to reduce the effect that poorly analyzed morphemes may make on sentence classification. It also proposes two noise insertion methods that further improve classification performance. Our evaluation results indicate that the proposed system classifies erroneous sentences more accurately than the baseline system by 17%p.0

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes