CVFeb 23, 2023Code
Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-IdentificationGerard Maggiolino, Adnan Ahmad, Jinkun Cao et al. · cmu
Motion-based association for Multi-Object Tracking (MOT) has recently re-achieved prominence with the rise of powerful object detectors. Despite this, little work has been done to incorporate appearance cues beyond simple heuristic models that lack robustness to feature degradation. In this paper, we propose a novel way to leverage objects' appearances to adaptively integrate appearance matching into existing high-performance motion-based methods. Building upon the pure motion-based method OC-SORT, we achieve 1st place on MOT20 and 2nd place on MOT17 with 63.9 and 64.9 HOTA, respectively. We also achieve 61.3 HOTA on the challenging DanceTrack benchmark as a new state-of-the-art even compared to more heavily-designed methods. The code and models are available at \url{https://github.com/GerardMaggiolino/Deep-OC-SORT}.
AIMay 23
Partner-Aware Hierarchical Skill Discovery for Robust Human-AI CollaborationAdnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo
Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behaviors. Conventional Deep Hierarchical Reinforcement Learning (DHRL) methods focus on agent-centric rewards and overlook partner behavior, leading to shortcut learning, where skills exploit spurious information instead of adapting to partners' dynamic behaviors. This limitation undermines agents' ability to adapt and coordinate effectively with novel partners. We introduce Partner-Aware Skill Discovery (PASD), a DHRL framework that learns skills conditioned on partner behavior. PASD introduces a contrastive intrinsic reward to capture patterns emerging from partner interactions, aligning skill representations across similar partners while maintaining discriminability across diverse strategies. By structuring the skill space based on partner interactions, this approach mitigates shortcut learning and promotes behavioral consistency, enabling robust and adaptive coordination. We extensively evaluate PASD in the Overcooked-AI benchmark with a diverse population of partners characterized by varying skill levels and play styles. We further evaluate the approach with human proxy models trained from human-human gameplay trajectories. PASD consistently outperforms existing population-based and hierarchical baselines, demonstrating transferable skill learning that generalizes across a wide range of partner behaviors. Analysis of learned skill representations shows that PASD adapts effectively to diverse partner behaviors, highlighting its robustness in human-AI collaboration.
AIMay 23
Adaptive Human-AI Coordination via Hierarchical Action DisentanglementAdnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo
Human-AI collaboration requires agents that can adapt to diverse partner behaviors and skill levels while remaining robust to unseen partners. Existing methods often collapse to a single dominant behavior or learn poorly aligned skills, limiting effective coordination. We propose Intrinsic Action Disentanglement (IAD), a deep hierarchical reinforcement learning (DHRL) framework that learns distinct, partner-aware low-level action sequences conditioned on high-level latent skills. IAD introduces an intrinsic reward that explicitly encourages disentangled action distributions of the agent's low-level policy across skills, yielding an interpretable mapping between high-level decisions and partner-specific behavioral responses. By capturing temporally extended interaction patterns, IAD enables flexible adaptation to heterogeneous partner dynamics under distributional shift. We evaluate IAD in the Overcooked-AI domain across multiple layouts and diverse partner settings, including unseen simulated partners, a human-proxy model trained on human-human gameplay, and real human partners. Results show that IAD consistently outperforms strong baselines and achieves more reliable, adaptive coordination across all settings.
LGJan 16
DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order InformationAdnan Ahmad, Chiara Boldrini, Lorenzo Valerio et al.
Decentralized Federated Learning (DFL) is a serverless collaborative machine learning paradigm where devices collaborate directly with neighbouring devices to exchange model information for learning a generalized model. However, variations in individual experiences and different levels of device interactions lead to data and model initialization heterogeneities across devices. Such heterogeneities leave variations in local model parameters across devices that leads to slower convergence. This paper tackles the data and model heterogeneity by explicitly addressing the parameter level varying evidential credence across local models. A novel aggregation approach is introduced that captures these parameter variations in local models and performs robust aggregation of neighbourhood local updates. Specifically, consensus weights are generated via approximation of second-order information of local models on their local datasets. These weights are utilized to scale neighbourhood updates before aggregating them into global neighbourhood representation. In extensive experiments with computer vision tasks, the proposed approach shows strong generalizability of local models at reduced communication costs.
CLMay 24, 2022
Building an Effective Automated Assessment System for C/C++ Introductory Programming Courses in ODL EnvironmentMuhammad Salman Khan, Adnan Ahmad, Muhammad Humayoun
Assessments help in evaluating the knowledge gained by a learner at any specific point as well as in continuous improvement of the curriculum design and the whole learning process. However, with the increase in students' enrollment at University level in either conventional or distance education environment, traditional ways of assessing students' work are becoming insufficient in terms of both time and effort. In distance education environment, such assessments become additionally more challenging in terms of hefty remuneration for hiring large number of tutors. The availability of automated tools to assist the evaluation of students' work and providing students with appropriate and timely feedback can really help in overcoming these problems. We believe that building such tools for assessing students' work for all kinds of courses in not yet possible. However, courses that involve some formal language of expression can be automated, such as, programming courses in Computer Science (CS) discipline. Instructors provide various practical exercises to students as assignments to build these skills. Usually, instructors manually grade and provide feedbacks on these assignments. Although in literature, various tools have been reported to automate this process, but most of these tools have been developed by the host institutions themselves for their own use. We at COMSATS Institute of Information Technology, Lahore are conducting a pioneer effort in Pakistan to automate the marking of assignments of introductory programming courses that involve C or C++ languages with the capability of associating appropriate feedbacks for students. In this paper, we basically identify different components that we believe are necessary in building an effective automated assessment system in the context of introductory programming courses that involve C/C++ programming.
CLSep 12, 2025Code
Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMsAdnan Ahmad, Philine Kowol, Stefan Hillmann et al.
In this paper, we provide an extensive analysis of multi-label intent classification using Large Language Models (LLMs) that are open-source, publicly available, and can be run in consumer hardware. We use the MultiWOZ 2.1 dataset, a benchmark in the dialogue system domain, to investigate the efficacy of three popular open-source pre-trained LLMs, namely LLama2-7B-hf, Mistral-7B-v0.1, and Yi-6B. We perform the classification task in a few-shot setup, giving 20 examples in the prompt with some instructions. Our approach focuses on the differences in performance of these models across several performance metrics by methodically assessing these models on multi-label intent classification tasks. Additionally, we compare the performance of the instruction-based fine-tuning approach with supervised learning using the smaller transformer model BertForSequenceClassification as a baseline. To evaluate the performance of the models, we use evaluation metrics like accuracy, precision, and recall as well as micro, macro, and weighted F1 score. We also report the inference time, VRAM requirements, etc. The Mistral-7B-v0.1 outperforms two other generative models on 11 intent classes out of 14 in terms of F-Score, with a weighted average of 0.50. It also has relatively lower Humming Loss and higher Jaccard Similarity, making it the winning model in the few-shot setting. We find BERT based supervised classifier having superior performance compared to the best performing few-shot generative LLM. The study provides a framework for small open-source LLMs in detecting complex multi-intent dialogues, enhancing the Natural Language Understanding aspect of task-oriented chatbots.
CVMay 13, 2025
Robust Emotion Recognition via Bi-Level Self-Supervised Continual LearningAdnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo
Emotion recognition through physiological signals such as electroencephalogram (EEG) has become an essential aspect of affective computing and provides an objective way to capture human emotions. However, physiological data characterized by cross-subject variability and noisy labels hinder the performance of emotion recognition models. Existing domain adaptation and continual learning methods struggle to address these issues, especially under realistic conditions where data is continuously streamed and unlabeled. To overcome these limitations, we propose a novel bi-level self-supervised continual learning framework, SSOCL, based on a dynamic memory buffer. This bi-level architecture iteratively refines the dynamic buffer and pseudo-label assignments to effectively retain representative samples, enabling generalization from continuous, unlabeled physiological data streams for emotion recognition. The assigned pseudo-labels are subsequently leveraged for accurate emotion prediction. Key components of the framework, including a fast adaptation module and a cluster-mapping module, enable robust learning and effective handling of evolving data streams. Experimental validation on two mainstream EEG tasks demonstrates the framework's ability to adapt to continuous data streams while maintaining strong generalization across subjects, outperforming existing approaches.
CLFeb 18, 2025
Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language ModelsAdnan Ahmad, Stefan Hillmann, Sebastian Möller
In this study, we explore the application of Large Language Models (LLMs) for generating synthetic users and simulating user conversations with a task-oriented dialogue system and present detailed results and their analysis. We propose a comprehensive novel approach to user simulation technique that uses LLMs to create diverse user profiles, set goals, engage in multi-turn dialogues, and evaluate the conversation success. We employ two proprietary LLMs, namely GPT-4o and GPT-o1 (Achiam et al., 2023), to generate a heterogeneous base of user profiles, characterized by varied demographics, multiple user goals, different conversational styles, initial knowledge levels, interests, and conversational objectives. We perform a detailed analysis of the user profiles generated by LLMs to assess the diversity, consistency, and potential biases inherent in these LLM-generated user simulations. We find that GPT-o1 generates more heterogeneous user distribution across most user attributes, while GPT-4o generates more skewed user attributes. The generated set of user profiles are then utilized to simulate dialogue sessions by interacting with a task-oriented dialogue system.
AIJul 25, 2025
Adaptive XAI in High Stakes Environments: Modeling Swift Trust with Multimodal Feedback in Human AI TeamsNishani Fernando, Bahareh Nakisa, Adnan Ahmad et al.
Effective human-AI teaming heavily depends on swift trust, particularly in high-stakes scenarios such as emergency response, where timely and accurate decision-making is critical. In these time-sensitive and cognitively demanding settings, adaptive explainability is essential for fostering trust between human operators and AI systems. However, existing explainable AI (XAI) approaches typically offer uniform explanations and rely heavily on explicit feedback mechanisms, which are often impractical in such high-pressure scenarios. To address this gap, we propose a conceptual framework for adaptive XAI that operates non-intrusively by responding to users' real-time cognitive and emotional states through implicit feedback, thereby enhancing swift trust in high-stakes environments. The proposed adaptive explainability trust framework (AXTF) leverages physiological and behavioral signals, such as EEG, ECG, and eye tracking, to infer user states and support explanation adaptation. At its core is a multi-objective, personalized trust estimation model that maps workload, stress, and emotion to dynamic trust estimates. These estimates guide the modulation of explanation features enabling responsive and personalized support that promotes swift trust in human-AI collaboration. This conceptual framework establishes a foundation for developing adaptive, non-intrusive XAI systems tailored to the rigorous demands of high-pressure, time-sensitive environments.
IVMar 9, 2021
3D-QCNet -- A Pipeline for Automated Artifact Detection in Diffusion MRI imagesAdnan Ahmad, Drew Parker, Zahra Riahi Samani et al.
Artifacts are a common occurrence in Diffusion MRI (dMRI) scans. Identifying and removing them is essential to ensure the accuracy and viability of any post processing carried out on these scans. This makes QC (quality control) a crucial first step prior to any analysis of dMRI data. Several QC methods for artifact detection exist, however they suffer from problems like requiring manual intervention and the inability to generalize across different artifacts and datasets. In this paper, we propose an automated deep learning (DL) pipeline that utilizes a 3D-Densenet architecture to train a model on diffusion volumes for automatic artifact detection. Our method is applied on a vast dataset consisting of 9000 volumes sourced from 7 large clinical datasets. These datasets comprise scans from multiple scanners with different gradient directions, high and low b values, single shell and multi shell acquisitions. Additionally, they represent diverse subject demographics like the presence or absence of pathologies. Our QC method is found to accurately generalize across this heterogenous data by correctly detecting 92% artifacts on average across our test set. This consistent performance over diverse datasets underlines the generalizability of our method, which currently is a significant barrier hindering the widespread adoption of automated QC techniques. For these reasons, we believe that 3D-QCNet can be integrated in diffusion pipelines to effectively automate the arduous and time-intensive process of artifact detection.
CLJan 21, 2020
Shared task: Lexical semantic change detection in German (Student Project Report)Adnan Ahmad, Kiflom Desta, Fabian Lang et al.
Recent NLP architectures have illustrated in various ways how semantic change can be captured across time and domains. However, in terms of evaluation there is a lack of benchmarks to compare the performance of these systems against each other. We present the results of the first shared task on unsupervised lexical semantic change detection (LSCD) in German based on the evaluation framework proposed by Schlechtweg et al. (2019).