CLDec 22, 2025

Algerian Dialect

arXiv:2512.19543v110 citationsh-index: 6Appl Sci
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited data for Algerian dialect researchers, though it is incremental as it primarily provides a new dataset.

The authors tackled the scarcity of resources for Algerian dialect by creating a large-scale sentiment-annotated dataset of 45,000 YouTube comments, which is publicly available to support research in sentiment analysis and dialectal Arabic NLP.

We present Algerian Dialect, a large-scale sentiment-annotated dataset consisting of 45,000 YouTube comments written in Algerian Arabic dialect. The comments were collected from more than 30 Algerian press and media channels using the YouTube Data API. Each comment is manually annotated into one of five sentiment categories: very negative, negative, neutral, positive, and very positive. In addition to sentiment labels, the dataset includes rich metadata such as collection timestamps, like counts, video URLs, and annotation dates. This dataset addresses the scarcity of publicly available resources for Algerian dialect and aims to support research in sentiment analysis, dialectal Arabic NLP, and social media analytics. The dataset is publicly available on Mendeley Data under a CC BY 4.0 license at https://doi.org/10.17632/zzwg3nnhsz.2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes