CLAIAug 1, 2022

Data Collection and Analysis of French Dialects

arXiv:2208.00752v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for dialect classification in linguistics and text analytics, but it is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of classifying French dialect texts from various French-speaking countries by creating a new dataset and applying machine learning classifiers, achieving results through evaluation of best features and classifiers.

This paper discusses creating and analysing a new dataset for data mining and text analytics research, contributing to a joint Leeds University research project for the Corpus of National Dialects. This report investigates machine learning classifiers to classify samples of French dialect text across various French-speaking countries. Following the steps of the CRISP-DM methodology, this report explores the data collection process, data quality issues and data conversion for text analysis. Finally, after applying suitable data mining techniques, the evaluation methods, best overall features and classifiers and conclusions are discussed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes