Maryam Mustafa

HC
h-index29
6papers
1citation
Novelty24%
AI Score42

6 Papers

HCFeb 9
Kissan-Dost: Bridging the Last Mile in Smallholder Precision Agriculture with Conversational IoT

Muhammad Saad Ali, Daanish U. Khan, Laiba Intizar Ahmad et al.

We present Kissan-Dost, a multilingual, sensor-grounded conversational system that turns live on-farm measurements and weather into plain-language guidance delivered over WhatsApp text or voice. The system couples commodity soil and climate sensors with retrieval-augmented generation, then enforces grounding, traceability, and proactive alerts through a modular pipeline. In a 90-day, two-site pilot with five participants, we ran three phases (baseline, dashboard only, chatbot only). Dashboard engagement was sporadic and faded, while the chatbot was used nearly daily and informed concrete actions. Controlled tests on 99 sensor-grounded crop queries achieved over 90 percent correctness with subsecond end-to-end latency, alongside high-quality translation outputs. Results show that careful last-mile integration, not novel circuitry, unlocks the latent value of existing Agri-IoT for smallholders.

HCApr 24
How GenAI is Helping Reimagine Antenatal Care in A Low-Resource Setting: From Provider Enablement to Patient Empowerment

Maryam Mustafa, Imaan Hameed, Amna Shahnawaz et al.

Despite steady global advances, maternal mortality remains alarmingly high in Pakistan (155 deaths per 100,000 live births in 2023); largely as a consequence of fragmented paper records, low literacy, poor access to quality healthcare, and gendered barriers that compromise care continuity. Over three years, we designed, deployed, and iteratively developed Awaaz-e-Sehat, a speech-based artificial intelligence (AI) system that generates electronic medical records (EMRs) and supports decision-making in maternal health. The tool evolved from a clinician-facing AI assistant that automated Urdu speech-to-EMR generation into a patient-centred WhatsApp-based platform, enabling women to generate their own structured clinical notes, receive AI-generated antenatal guidance, and share QR-coded records with providers anywhere in the country. This case study documents that translational journey, i.e., how the ground realities of workload, linguistic nuance, and infrastructural constraints reshaped our design. The result is not merely a new method of record-keeping, but a reimagining of antenatal care and electronic medical records themselves. In settings where clinicians are time-constrained and have little institutional incentive to document, Awaaz-e-Sehat proposes a model of care that centres patients as active participants in generating and owning their health data. By keeping patients informed about their own risk factors and integrating them into the clinical decision-support loop, the system transforms EMRs and CDSS from static institutional artefacts into dynamic tools for self-advocacy and shared accountability in maternal health.

HCApr 7
Designing Around Stigma: Human-Centered LLMs for Menstrual Health

Amna Shahnawaz, Ayesha Shafique, Ding Wang et al.

Menstrual health education (MHE) in Pakistan is constrained by cultural taboos and inadequate formal curricula, leaving women with few trusted resources to lean on. In response to these challenges, we introduce a WhatsApp-based chatbot powered by a large language model (LLM) and Retrieval Augmented Generation (RAG), co-designed with Pakistani college women. Workshops (N=30) revealed key design requirements -- support for Roman Urdu, use of subsidized platforms, and an expert -- curated knowledge base. We then deployed the chatbot with 13 participants for two weeks (403 messages and interviews). Women used it to challenge cultural taboos, legitimize health concerns often dismissed as normal, and build reproductive health knowledge through iterative questioning. Yet, interactions also exposed tensions: reliance on cultural explanatory models, questions of trust and validation, and gendered persona of the chatbot itself. We contribute empirical insights, a stigma-aware design framework for culturally sensitive conversational AI, and a methodological lens foregrounding expert validation in intimate health domains.

CYOct 31, 2025
Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts

Ameemah Humayun, Bushra Zubair, Maryam Mustafa

Low-resource countries represent over 90% of maternal deaths, with Pakistan among the top four countries contributing nearly half in 2023. Since these deaths are mostly preventable, large language models (LLMs) can help address this crisis by automating health communication and risk assessment. However, sexual and reproductive health (SRH) communication in conservative contexts often relies on indirect language that obscures meaning, complicating LLM-based interventions. We conduct a two-stage study in Pakistan: (1) analyzing data from clinical observations, interviews, and focus groups with clinicians and patients, and (2) evaluating the interpretive capabilities of five popular LLMs on this data. Our analysis identifies two axes of communication (referential domain and expression approach) and shows LLMs struggle with semantic drift, myths, and polysemy in clinical interactions. We contribute: (1) empirical themes in SRH communication, (2) a categorization framework for indirect communication, (3) evaluation of LLM performance, and (4) design recommendations for culturally-situated SRH communication.

AIOct 5, 2025
A global log for medical AI

Ayush Noori, Adam Rodman, Alan Karthikesalingam et al.

Modern computer systems often rely on syslog, a simple, universal protocol that records every critical event across heterogeneous infrastructure. However, healthcare's rapidly growing clinical AI stack has no equivalent. As hospitals rush to pilot large language models and other AI-based clinical decision support tools, we still lack a standard way to record how, when, by whom, and for whom these AI models are used. Without that transparency and visibility, it is challenging to measure real-world performance and outcomes, detect adverse events, or correct bias or dataset drift. In the spirit of syslog, we introduce MedLog, a protocol for event-level logging of clinical AI. Any time an AI model is invoked to interact with a human, interface with another algorithm, or act independently, a MedLog record is created. This record consists of nine core fields: header, model, user, target, inputs, artifacts, outputs, outcomes, and feedback, providing a structured and consistent record of model activity. To encourage early adoption, especially in low-resource settings, and minimize the data footprint, MedLog supports risk-based sampling, lifecycle-aware retention policies, and write-behind caching; detailed traces for complex, agentic, or multi-stage workflows can also be captured under MedLog. MedLog can catalyze the development of new databases and software to store and analyze MedLog records. Realizing this vision would enable continuous surveillance, auditing, and iterative improvement of medical AI, laying the foundation for a new form of digital epidemiology.

HCJun 17, 2021
Investigating Misinformation Dissemination on Social Media in Pakistan

Danyal Haroon, Hammad Arif, Ahmed Abdullah Tariq et al.

Fake news and misinformation are one of the most significant challenges brought about by advances in communication technologies. We chose to research the spread of fake news in Pakistan because of some unfortunate incidents that took place during 2020. These included the downplaying of the severity of the COVID-19 pandemic, and protests by right-wing political movements. We observed that fake news and misinformation contributed significantly to these events and especially affected low-literate and low-income populations. We conducted a cross-platform comparison of misinformation on WhatsApp, Twitter and YouTube with a primary focus on messages shared in public WhatsApp groups, and analysed the characteristics of misinformation, techniques used to make is believable, and how users respond to it. To the best of our knowledge, this is the first attempt to compare misinformation on all three platforms in Pakistan. Data collected over a span of eight months helped us identify fake news and misinformation related to politics, religion and health, among other categories. Common elements which were used by fake news creators in Pakistan to make false content seem believable included: appeals to emotion, conspiracy theories, political and religious polarization, incorrect facts and impersonation of credible sources.