CL AIAug 31, 2020

Classifier Combination Approach for Question Classification for Bengali Question Answering System

Somnath Banerjee, Sudip Kumar Naskar, Paolo Rosso, Sivaji Bandyopadhyay

arXiv:2008.13597v25 citations

Originality Synthesis-oriented

AI Analysis

This work addresses question classification for Bengali question answering systems, with potential applications to other Indo-Aryan languages, but it is incremental as it applies known combination techniques to a specific language.

The authors tackled question classification for Bengali by combining multiple classifiers, achieving a 4.02% improvement over single classifiers for coarse-grained classes and 87.79% accuracy with stacking for fine-grained classes.

Question classification (QC) is a prime constituent of automated question answering system. The work presented here demonstrates that the combination of multiple models achieve better classification performance than those obtained with existing individual models for the question classification task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble, stacking and voting, to increase QC accuracy. Lexical, syntactic and semantic features of Bengali questions are used for four well-known classifiers, namely Naïve Bayes, kernel Naïve Bayes, Rule Induction, and Decision Tree, which serve as our base learners. Single-layer question-class taxonomy with 8 coarse-grained classes is extended to two-layer taxonomy by adding 69 fine-grained classes. We carried out the experiments both on single-layer and two-layer taxonomies. Experimental results confirmed that classifier combination approaches outperform single classifier classification approaches by 4.02% for coarse-grained question classes. Overall, the stacking approach produces the best results for fine-grained classification and achieves 87.79% of accuracy. The approach presented here could be used in other Indo-Aryan or Indic languages to develop a question answering system.

View on arXiv PDF

Similar