CL LGAug 18, 2019

TwistBytes -- Hierarchical Classification at GermEval 2019: walking the fine line (of recall and precision)

arXiv:1908.06493v10.2

Originality Synthesis-oriented

AI Analysis

This work addresses hierarchical text classification for German blurbs, representing an incremental improvement in a specific domain task.

The authors tackled hierarchical classification of German blurbs in the GermEval 2019 shared task, achieving first place in the hierarchical subtask and second in the flat classification subtask by using TF-IDF features with SVMs and a post-processing method to handle multi-label aspects.

We present here our approach to the GermEval 2019 Task 1 - Shared Task on hierarchical classification of German blurbs. We achieved first place in the hierarchical subtask B and second place on the root node, flat classification subtask A. In subtask A, we applied a simple multi-feature TF-IDF extraction method using different n-gram range and stopword removal, on each feature extraction module. The classifier on top was a standard linear SVM. For the hierarchical classification, we used a local approach, which was more light-weighted but was similar to the one used in subtask A. The key point of our approach was the application of a post-processing to cope with the multi-label aspect of the task, increasing the recall but not surpassing the precision measure score.

View on arXiv PDF

Similar