Murtuza N. Shergadwala

HC
3papers
1citation
Novelty38%
AI Score31

3 Papers

SEJan 16
The Stability Trap: Evaluating the Reliability of LLM-Based Instruction Adherence Auditing

Murtuza N. Shergadwala

The enterprise governance of Generative AI (GenAI) in regulated sectors, such as Human Resources (HR), demands scalable yet reproducible auditing mechanisms. While Large Language Model (LLM)-as-a-Judge approaches offer scalability, their reliability in evaluating adherence of different types of system instructions remains unverified. This study asks: To what extent does the instruction type of an Application Under Test (AUT) influence the stability of judge evaluations? To address this, we introduce the Scoped Instruction Decomposition Framework to classify AUT instructions into Objective and Subjective types, isolating the factors that drive judge instability. We applied this framework to two representative HR GenAI applications, evaluating the stability of four judge architectures over variable runs. Our results reveal a ``Stability Trap'' characterized by a divergence between Verdict Stability and Reasoning Stability. While judges achieved near-perfect verdict agreement ($>99\%$) for both objective and subjective evaluations, their accompanying justification traces diverged significantly. Objective instructions requiring quantitative analysis, such as word counting, exhibited reasoning stability as low as $\approx19\%$, driven by variances in numeric justifications. Similarly, reasoning stability for subjective instructions varied widely ($35\%$--$83\%$) based on evidence granularity, with feature-specific checks failing to reproduce consistent rationale. Conversely, objective instructions focusing on discrete entity extraction achieved high reasoning stability ($>90\%$). These findings demonstrate that high verdict stability can mask fragile reasoning. Thus, we suggest that auditors scope automated evaluation protocols strictly: delegate all deterministically verifiable logic to code, while reserving LLM judges for complex semantic evaluation.

HCJul 26, 2021
Can we infer player behavior tendencies from a player's decision-making data? Integrating Theory of Mind to Player Modeling

Murtuza N. Shergadwala, Zhaoqing Teng, Magy Seif El-Nasr

Game AI systems need the theory of mind, which is the humanistic ability to infer others' mental models, preferences, and intent. Such systems would enable inferring players' behavior tendencies that contribute to the variations in their decision-making behaviors. To that end, in this paper, we propose the use of inverse Bayesian inference to infer behavior tendencies given a descriptive cognitive model of a player's decision making. The model embeds behavior tendencies as weight parameters in a player's decision-making. Inferences on such parameters provide intuitive interpretations about a player's cognition while making in-game decisions. We illustrate the use of inverse Bayesian inference with synthetically generated data in a game called \textit{BoomTown} developed by Gallup. We use the proposed model to infer a player's behavior tendencies for moving decisions on a game map. Our results indicate that our model is able to infer these parameters towards uncovering not only a player's decision making but also their behavior tendencies for making such decisions.

HCMar 8, 2021
Esports Agents with a Theory of Mind: Towards Better Engagement, Education, and Engineering

Murtuza N. Shergadwala, Magy Seif El-Nasr

The role of AI in esports is shifting from leveraging games as a testbed for improving AI algorithms to addressing the needs of the esports players such as enhancing their gaming experience, esports skills, and providing coaching. For AI to be able to effectively address such needs in esports, AI agents require a theory of mind, that is, the ability to infer players' tactics and intents. To that end, in this position paper, we argue for human-in-the-loop approaches for the discovery and computational embedding of the theory of mind within behavioral models of esports players. We discuss that such approaches can be enabled by player-centric investigations on situated cognition that will expand our understanding of the cognitive and other unobservable factors that influence esports players' behaviors. We conclude by discussing the implications of such a research direction in esports as well as broader implications in engineering design and design education.