CLJun 13, 2022
Mediators: Conversational Agents Explaining NLP Model BehaviorNils Feldhus, Ajay Madhavan Ravichandran, Sebastian Möller
The human-centric explainable artificial intelligence (HCXAI) community has raised the need for framing the explanation process as a conversation between human and machine. In this position paper, we establish desiderata for Mediators, text-based conversational agents which are capable of explaining the behavior of neural models interactively using natural language. From the perspective of natural language processing (NLP) research, we engineer a blueprint of such a Mediator for the task of sentiment analysis and assess how far along current research is on the path towards dialogue-based explanations.
CLSep 17, 2025
Integrating Text and Time-Series into (Large) Language Models to Predict Medical OutcomesIyadh Ben Cheikh Larbi, Ajay Madhavan Ravichandran, Aljoscha Burchardt et al.
Large language models (LLMs) excel at text generation, but their ability to handle clinical classification tasks involving structured data, such as time series, remains underexplored. In this work, we adapt instruction-tuned LLMs using DSPy-based prompt optimization to process clinical notes and structured EHR inputs jointly. Our results show that this approach achieves performance on par with specialized multimodal systems while requiring less complexity and offering greater adaptability across tasks.
LGJun 17, 2025
One Size Fits None: Rethinking Fairness in Medical AIRoland Roller, Michael Hahn, Ajay Madhavan Ravichandran et al.
Machine learning (ML) models are increasingly used to support clinical decision-making. However, real-world medical datasets are often noisy, incomplete, and imbalanced, leading to performance disparities across patient subgroups. These differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups. In this work, we analyze several medical prediction tasks and demonstrate how model performance varies with patient characteristics. While ML models may demonstrate good overall performance, we argue that subgroup-level evaluation is essential before integrating them into clinical workflows. By conducting a performance analysis at the subgroup level, differences can be clearly identified-allowing, on the one hand, for performance disparities to be considered in clinical practice, and on the other hand, for these insights to inform the responsible development of more effective models. Thereby, our work contributes to a practical discussion around the subgroup-sensitive development and deployment of medical ML models and the interconnectedness of fairness and transparency.