53.7CLApr 8
SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online PolarizationUsman Naseem, Robert Geislinger, Juan Ren et al.
We present SemEval-2026 Task 9, a shared task on online polarization detection, covering 22 languages and comprising over 110K annotated instances. Each data instance is multi-labeled with the presence of polarization, polarization type, and polarization manifestation. Participants were asked to predict labels in three sub-tasks: (1) detecting the presence of polarization, (2) identifying the type of polarization, and (3) recognizing the polarization manifestation. The three tasks attracted over 1,000 participants worldwide and more than 10k submission on Codabench. We received final submissions from 67 teams and 73 system description papers. We report the baseline results and analyze the performance of the best-performing systems, highlighting the most common approaches and the most effective methods across different subtasks and languages. The dataset of this task is publicly available.
LGMar 18, 2022
Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech RecognitionAbdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza
Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is computationally demanding and time-consuming. We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR. We discover that the dataset pruning strategies used in vision tasks for sampling the most informative examples do not perform better than random subset selection on fine-tuning self-supervised ASR. We then present the COWERAGE algorithm for representative subset selection in self-supervised ASR. COWERAGE is based on our finding that ensuring the coverage of examples based on training Word Error Rate (WER) in the early training epochs leads to better generalization performance. Extensive experiments with the wav2vec 2.0 and HuBERT model on TIMIT, Librispeech, and LJSpeech datasets show the effectiveness of COWERAGE and its transferability across models, with up to 17% relative WER improvement over existing dataset pruning methods and random sampling. We also demonstrate that the coverage of training instances in terms of WER values ensures the inclusion of phonemically diverse examples, leading to better test accuracy in self-supervised speech recognition models.
NIJul 5, 2024
Rethinking Image Compression on the Web with Generative AIShayan Ali Hassan, Danish Humair, Ihsan Ayyub Qazi et al.
The rapid growth of the Internet, driven by social media, web browsing, and video streaming, has made images central to the Web experience, resulting in significant data transfer and increased webpage sizes. Traditional image compression methods, while reducing bandwidth, often degrade image quality. This paper explores a novel approach using generative AI to reconstruct images at the edge or client-side. We develop a framework that leverages text prompts and provides additional conditioning inputs like Canny edges and color palettes to a text-to-image model, achieving up to 99.8% bandwidth savings in the best cases and 92.6% on average, while maintaining high perceptual similarity. Empirical analysis and a user study show that our method preserves image meaning and structure more effectively than traditional compression methods, offering a promising solution for reducing bandwidth usage and improving Internet affordability with minimal degradation in image quality.
LGJul 31, 2025
TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached ResponsesMuhammad Taha Cheema, Abeer Aamir, Khawaja Gul Muhammad et al.
Large Language Models (LLMs) process millions of queries daily, making efficient response caching a compelling optimization for reducing cost and latency. However, preserving relevance to user queries using this approach proves difficult due to the personalized nature of chatbot interactions and the limited accuracy of semantic similarity search. To address this, we present TweakLLM, a novel routing architecture that employs a lightweight LLM to dynamically adapt cached responses to incoming prompts. Through comprehensive evaluation, including user studies with side-by-side comparisons, satisfaction voting, as well as multi-agent LLM debates, we demonstrate that TweakLLM maintains response quality comparable to frontier models while significantly improving cache effectiveness. Our results across real-world datasets highlight TweakLLM as a scalable, resource-efficient caching solution for high-volume LLM deployments without compromising user experience.
CLMar 14, 2024
To Label or Not to Label: Hybrid Active Learning for Neural Machine TranslationAbdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza
Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose Hybrid Uncertainty and Diversity Sampling (HUDS), an AL strategy for domain adaptation in NMT that combines uncertainty and diversity for sentence selection. HUDS computes uncertainty scores for unlabeled sentences and subsequently stratifies them. It then clusters sentence embeddings within each stratum and computes diversity scores by distance to the centroid. A weighted hybrid score that combines uncertainty and diversity is then used to select the top instances for annotation in each AL iteration. Experiments on multi-domain German-English and French-English datasets demonstrate the better performance of HUDS over other strong AL baselines. We analyze the sentence selection with HUDS and show that it prioritizes diverse instances having high model uncertainty for annotation in early AL iterations.
LGOct 27, 2021
FedPrune: Towards Inclusive Federated LearningMuhammad Tahir Munir, Muhammad Mustansar Saeed, Mahad Ali et al.
Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example, slow clients get dropped in FL schemes, such as Federated Averaging (FedAvg), which not only limits overall learning but also biases results towards fast clients. We propose FedPrune; a system that tackles this challenge by pruning the global model for slow clients based on their device characteristics. By doing so, slow clients can train a small model quickly and participate in FL which increases test accuracy as well as fairness. By using insights from Central Limit Theorem, FedPrune incorporates a new aggregation technique that achieves robust performance over non-IID data. Experimental evaluation shows that Fed- Prune provides robust convergence and better fairness compared to Federated Averaging.
HCJun 17, 2021
Investigating Misinformation Dissemination on Social Media in PakistanDanyal Haroon, Hammad Arif, Ahmed Abdullah Tariq et al.
Fake news and misinformation are one of the most significant challenges brought about by advances in communication technologies. We chose to research the spread of fake news in Pakistan because of some unfortunate incidents that took place during 2020. These included the downplaying of the severity of the COVID-19 pandemic, and protests by right-wing political movements. We observed that fake news and misinformation contributed significantly to these events and especially affected low-literate and low-income populations. We conducted a cross-platform comparison of misinformation on WhatsApp, Twitter and YouTube with a primary focus on messages shared in public WhatsApp groups, and analysed the characteristics of misinformation, techniques used to make is believable, and how users respond to it. To the best of our knowledge, this is the first attempt to compare misinformation on all three platforms in Pakistan. Data collected over a span of eight months helped us identify fake news and misinformation related to politics, religion and health, among other categories. Common elements which were used by fake news creators in Pakistan to make false content seem believable included: appeals to emotion, conspiracy theories, political and religious polarization, incorrect facts and impersonation of credible sources.