CLJun 2, 2025
Unified Large Language Models for Misinformation Detection in Low-Resource Linguistic SettingsMuhammad Islam, Javed Ali Khan, Mohammed Abaker et al.
The rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-language news, there is a noticeable gap in resources and strategies to detect news in regional languages, such as Urdu. Advanced Fake News Detection (FND) techniques rely heavily on large, accurately labeled datasets. However, FND in under-resourced languages like Urdu faces substantial challenges due to the scarcity of extensive corpora and the lack of validated lexical resources. Current Urdu fake news datasets are often domain-specific and inaccessible to the public. They also lack human verification, relying mainly on unverified English-to-Urdu translations, which compromises their reliability in practical applications. This study highlights the necessity of developing reliable, expert-verified, and domain-independent Urdu-enhanced FND datasets to improve fake news detection in Urdu and other resource-constrained languages. This paper presents the first benchmark large FND dataset for Urdu news, which is publicly available for validation and deep analysis. We also evaluate this dataset using multiple state-of-the-art pre-trained large language models (LLMs), such as XLNet, mBERT, XLM-RoBERTa, RoBERTa, DistilBERT, and DeBERTa. Additionally, we propose a unified LLM model that outperforms the others with different embedding and feature extraction techniques. The performance of these models is compared based on accuracy, F1 score, precision, recall, and human judgment for vetting the sample results of news.
NIApr 14, 2020
Issues and challenges in Cloud Storage Architecture: A SurveyAnwar Ghani, Afzal Badshah, Saeedullah Jan et al.
From home appliances to industrial enterprises, the Information and Communication Technology (ICT) industry is revolutionizing the world. We are witnessing the emergence of new technologies (e.g, Cloud computing, Fog computing, Internet of Things (IoT), Artificial Intelligence (AI) and Block-chain) which proves the growing use of ICT (e,g. business, education, health, and home appliances), resulting in massive data generation. It is expected that more than 175 ZB data will be processed annually by 75 billion devices by 2025. The 5G technology (i.e. mobile communication technology) dramatically increases network speed, enabling users to upload ultra high definition videos in real-time, which will generate a massive stream of big data. Furthermore, smart devices, having artificial intelligence, will act like a human being (e.g, a self-driving vehicle, etc) on the network, will also generate big data. This sudden shift and massive data generation created serious challenges in storing and managing heterogeneous data at such a large scale. This article presents a state-of-the-art review of the issues and challenges involved in storing heterogeneous big data, their countermeasures (i.e, from security and management perspectives), and future opportunities of cloud storage. These challenges are reviewed in detail and new dynamics for researchers in the field of cloud storage are discovered.
SIApr 14, 2020
Author Name Disambiguation in Bibliographic Databases: A SurveyMuhammad Shoaib, Ali Daud, Tehmina Amjad
Entity resolution is a challenging and hot research area in the field of Information Systems since last decade. Author Name Disambiguation (AND) in Bibliographic Databases (BD) like DBLP , Citeseer , and Scopus is a specialized field of entity resolution. Given many citations of underlying authors, the AND task is to find which citations belong to the same author. In this survey, we start with three basic AND problems, followed by need for solution and challenges. A generic, five-step framework is provided for handling AND issues. These steps are; (1) Preparation of dataset (2) Selection of publication attributes (3) Selection of similarity metrics (4) Selection of models and (5) Clustering Performance evaluation. Categorization and elaboration of similarity metrics and methods are also provided. Finally, future directions and recommendations are given for this dynamic area of research.