CVAICLHCLGJan 31, 2025

AIN: The Arabic INclusive Large Multimodal Model

arXiv:2502.00094v210 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses the problem of limited multimodal AI tools for Arabic speakers, though it is incremental as it extends existing LMM approaches to a new language context.

The paper tackles the lack of advanced Arabic large multimodal models (LMMs) by introducing AIN, a bilingual English-Arabic LMM, which achieves state-of-the-art Arabic performance and outperforms GPT-4o by 3.4% on a comprehensive benchmark across 38 sub-domains.

Amid the swift progress of large language models (LLMs) and their evolution into large multimodal models (LMMs), significant strides have been made in high-resource languages such as English and Chinese. While Arabic LLMs have seen notable progress, Arabic LMMs remain largely unexplored, often narrowly focusing on a few specific aspects of the language and visual understanding. To bridge this gap, we introduce AIN-the Arabic Inclusive Multimodal Model-designed to excel across diverse domains. AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic, leveraging carefully constructed 3.6 million high-quality Arabic-English multimodal data samples. AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities. On the recent CAMEL-Bench benchmark comprising 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding, our AIN demonstrates strong performance with the 7B model outperforming GPT-4o by an absolute gain of 3.4% averaged over eight domains and 38 sub-domains. AIN's superior capabilities position it as a significant step toward empowering Arabic speakers with advanced multimodal generative AI tools across diverse applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes