CL AIAug 24, 2025

CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation

Hunzalah Hassan Bhatti, Youssef Ahmed, Md Arid Hasan, Firoj Alam

U of Toronto

arXiv:2508.17324v22 citationsh-index: 17Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Originality Synthesis-oriented

AI Analysis

This work addresses cultural knowledge representation for Arabic speakers, but it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of cultural knowledge representation for Arabic by augmenting datasets and fine-tuning large language models, resulting in a system that achieved 70.50% accuracy on a blind test set and 84.1% on a development set.

In this paper, we report our participation to the PalmX cultural evaluation shared task. Our system, CultranAI, focused on data augmentation and LoRA fine-tuning of large language models (LLMs) for Arabic cultural knowledge representation. We benchmarked several LLMs to identify the best-performing model for the task. In addition to utilizing the PalmX dataset, we augmented it by incorporating the Palm dataset and curated a new dataset of over 22K culturally grounded multiple-choice questions (MCQs). Our experiments showed that the Fanar-1-9B-Instruct model achieved the highest performance. We fine-tuned this model on the combined augmented dataset of 22K+ MCQs. On the blind test set, our submitted system ranked 5th with an accuracy of 70.50%, while on the PalmX development set, it achieved an accuracy of 84.1%.

View on arXiv PDF

Similar