Classifier Ensembles for Dialect and Language Variety Identification
This work addresses dialect identification for computational linguistics, but it is incremental as it applies existing ensemble methods to new datasets.
The paper tackled dialect and language variety identification by developing ensemble-based systems for distinguishing Flemish vs. Dutch in subtitles and four Arabic dialects in speech, achieving competitive performance compared to other submissions in shared tasks.
In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018. We present a system developed to discriminate between Flemish and Dutch in subtitles and a system trained to discriminate between four Arabic dialects: Egyptian, Levantine, Gulf, North African, and Modern Standard Arabic in speech broadcasts. Finally, we compare the performance of these two systems with the other systems submitted to the Discriminating between Dutch and Flemish in Subtitles (DFS) and the Arabic Dialect Identification (ADI) shared tasks at VarDial 2018.