CLFeb 20, 2024
Normalized Orthography for Tunisian ArabicHoucemeddine Turki, Kawthar Ellouze, Hager Ben Ammar et al.
Tunisian Arabic (ISO 693-3: aeb) isa distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA* guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource development by ensuring user-friendliness and consistency. The updated standard addresses challenges in accurately representing Tunisian phonology and morphology, correcting issues from transcriptions based on Modern Standard Arabic.
CLJan 24, 2024
Text Categorization Can Enhance Domain-Agnostic Stopword ExtractionHoucemeddine Turki, Naome A. Etori, Mohamed Ali Hadj Taieb et al.
This paper investigates the role of text categorization in streamlining stopword extraction in natural language processing (NLP), specifically focusing on nine African languages alongside French. By leveraging the MasakhaNEWS, African Stopwords Project, and MasakhaPOS datasets, our findings emphasize that text categorization effectively identifies domain-agnostic stopwords with over 80% detection success rate for most examined languages. Nevertheless, linguistic variances result in lower detection rates for certain languages. Interestingly, we find that while over 40% of stopwords are common across news categories, less than 15% are unique to a single category. Uncommon stopwords add depth to text but their classification as stopwords depends on context. Therefore combining statistical and linguistic approaches creates comprehensive stopword lists, highlighting the value of our hybrid method. This research enhances NLP for African languages and underscores the importance of text categorization in stopword extraction.
CVJan 21, 2024
The State of Computer Vision Research in AfricaAbdul-Hakeem Omotayo, Ashery Mbilinyi, Lukman Ismaila et al.
Despite significant efforts to democratize artificial intelligence (AI), computer vision which is a sub-field of AI, still lags in Africa. A significant factor to this, is the limited access to computing resources, datasets, and collaborations. As a result, Africa's contribution to top-tier publications in this field has only been 0.06% over the past decade. Towards improving the computer vision field and making it more accessible and inclusive, this study analyzes 63,000 Scopus-indexed computer vision publications from Africa. We utilize large language models to automatically parse their abstracts, to identify and categorize topics and datasets. This resulted in listing more than 100 African datasets. Our objective is to provide a comprehensive taxonomy of dataset categories to facilitate better understanding and utilization of these resources. We also analyze collaboration trends of researchers within and outside the continent. Additionally, we conduct a large-scale questionnaire among African computer vision researchers to identify the structural barriers they believe require urgent attention. In conclusion, our study offers a comprehensive overview of the current state of computer vision research in Africa, to empower marginalized communities to participate in the design and development of computer vision systems.
CVMay 11, 2023
Towards a Better Understanding of the Computer Vision Research Community in AfricaAbdul-Hakeem Omotayo, Mai Gamal, Eman Ehab et al.
Computer vision is a broad field of study that encompasses different tasks (e.g., object detection). Although computer vision is relevant to the African communities in various applications, yet computer vision research is under-explored in the continent and constructs only 0.06% of top-tier publications in the last ten years. In this paper, our goal is to have a better understanding of the computer vision research conducted in Africa and provide pointers on whether there is equity in research or not. We do this through an empirical analysis of the African computer vision publications that are Scopus indexed, where we collect around 63,000 publications over the period 2012-2022. We first study the opportunities available for African institutions to publish in top-tier computer vision venues. We show that African publishing trends in top-tier venues over the years do not exhibit consistent growth, unlike other continents such as North America or Asia. Moreover, we study all computer vision publications beyond top-tier venues in different African regions to find that mainly Northern and Southern Africa are publishing in computer vision with 68.5% and 15.9% of publications, resp. Nonetheless, we highlight that both Eastern and Western Africa are exhibiting a promising increase with the last two years closing the gap with Southern Africa. Additionally, we study the collaboration patterns in these publications to find that most of these exhibit international collaborations rather than African ones. We also show that most of these publications include an African author that is a key contributor as the first or last author. Finally, we present the most recurring keywords in computer vision publications per African region.
LGOct 30, 2020
Knowledge-Based Construction of Confusion Matrices for Multi-Label Classification Algorithms using Semantic Similarity MeasuresHoucemeddine Turki, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha
So far, multi-label classification algorithms have been evaluated using statistical methods that do not consider the semantics of the considered classes and that fully depend on abstract computations such as Bayesian Reasoning. Currently, there are several attempts to develop ontology-based methods for a better assessment of supervised classification algorithms. In this research paper, we define a novel approach that aligns expected labels with predicted labels in multi-label classification using ontology-driven feature-based semantic similarity measures and we use it to develop a method for creating precise confusion matrices for a more effective evaluation of multi-label classification algorithms.