CLApr 19, 2023
MasakhaNEWS: News Topic Classification for African languagesDavid Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime et al. · mila
African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.
CLNov 16, 2023
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African LanguagesJiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal et al.
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
CYMay 24, 2023
Trends and Challenges Towards an Effective Data-Driven Decision Making in UK SMEs: Case Studies and Lessons Learnt from the Analysis of 85 SMEsAbdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor et al.
The adoption of data science brings vast benefits to Small and Medium-sized Enterprises (SMEs) including business productivity, economic growth, innovation and jobs creation. Data Science can support SMEs to optimise production processes, anticipate customers' needs, predict machinery failures and deliver efficient smart services. Businesses can also harness the power of Artificial Intelligence (AI) and Big Data and the smart use of digital technologies to enhance productivity and performance, paving the way for innovation. However, integrating data science decisions into an SME requires both skills and IT investments. In most cases, such expenses are beyond the means of SMEs due to limited resources and restricted access to financing. This paper presents trends and challenges towards an effective data-driven decision making for organisations based on a case study of 85 SMEs, mostly from the West Midlands region of England. The work is supported as part of a 3 years ERDF (European Regional Development Funded project) in the areas of big data management, analytics and business intelligence. We present two case studies that demonstrates the potential of Digitisation, AI and Machine Learning and use these as examples to unveil challenges and showcase the wealth of current available opportunities for SMEs.
CYFeb 14, 2020
Trends of digitalization and adoption of big data & analytics among UK SMEs: Analysis and lessons drawn from a case study of 53 SMEsMuhidin Mohamed, Philip Weber
Small and Medium Enterprises (SMEs) now generate digital data at an unprecedented rate from online transactions, social media marketing and associated customer interactions, online product or service reviews and feedback, clinical diagnosis, Internet of Things (IoT) sensors, and production processes. All these forms of data can be transformed into monetary value if put into a proper data value chain. This requires both skills and IT investments for the long-term benefit of businesses. However, such spending is beyond the capacity of most SMEs due to their limited resources and restricted access to finances. This paper presents lessons learned from a case study of 53 UK SMEs, mostly from the West Midlands region of England, supported as part of a 3-year ERDF project, Big Data Corridor, in the areas of big data management, analytics and related IT issues. Based on our study's sample companies, several perspectives including the digital technology trends, challenges facing the UK SMEs, and the state of their adoption in data analytics and big data, are presented in the paper.