Jonathan Dodge

HC
h-index3
10papers
710citations
Novelty33%
AI Score28

10 Papers

HCAug 23, 2024
How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga et al.

Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second is a re-analysis of a prior study on a size-4 action space. Other researchers adopting our operationalization of the prediction task and analysis methodology will improve the rigor of user studies conducted with that task, which is particularly important when the domain features a large output space.

CLJan 29, 2025
Fake News Detection After LLM Laundering: Measurement and Explanation

Rupak Kumar Das, Jonathan Dodge

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

CLApr 17, 2024
Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service

Shikha Soneji, Mitchell Hoesing, Sujay Koujalgi et al.

The complexities of legalese in terms and policy documents can bind individuals to contracts they do not fully comprehend, potentially leading to uninformed data sharing. Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents, aiming to enhance user understanding and facilitate informed decisions. We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score. Leveraging our best-performing model, RoBERTa, we highlighted redundancies and potential guideline violations by identifying overlaps in GDPR-required documents, underscoring the necessity for stricter GDPR compliance.

AIFeb 15, 2024
Experiments with Encoding Structured Data for Neural Networks

Sujay Nagesh Koujalgi, Jonathan Dodge

The project's aim is to create an AI agent capable of selecting good actions in a game-playing domain called Battlespace. Sequential domains like Battlespace are important testbeds for planning problems, as such, the Department of Defense uses such domains for wargaming exercises. The agents we developed combine Monte Carlo Tree Search (MCTS) and Deep Q-Network (DQN) techniques in an effort to navigate the game environment, avoid obstacles, interact with adversaries, and capture the flag. This paper will focus on the encoding techniques we explored to present complex structured data stored in a Python class, a necessary precursor to an agent.

AISep 28, 2021
Identifying Reasoning Flaws in Planning-Based RL Using Tree Explanations

Kin-Ho Lam, Zhengxian Lin, Jed Irvine et al.

Enabling humans to identify potential flaws in an agent's decision making is an important Explainable AI application. We consider identifying such flaws in a planning-based deep reinforcement learning (RL) agent for a complex real-time strategy game. In particular, the agent makes decisions via tree search using a learned model and evaluation function over interpretable states and actions. This gives the potential for humans to identify flaws at the level of reasoning steps in the tree, even if the entire reasoning process is too complex to understand. However, it is unclear whether humans will be able to identify such flaws due to the size and complexity of trees. We describe a user interface and case study, where a small group of AI experts and developers attempt to identify reasoning flaws due to inaccurate agent learning. Overall, the interface allowed the group to identify a number of significant flaws of varying types, demonstrating the promise of this approach.

HCMar 22, 2019
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

Andrew Anderson, Jonathan Dodge, Amrita Sadarangani et al.

We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants' mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

HCJan 23, 2019
Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang et al.

Ensuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically generated explanations to understand how they impact people's fairness judgments of ML systems. With an experiment involving more than 160 Mechanical Turk workers, we show that: 1) Certain explanations are considered inherently less fair, while others can enhance people's confidence in the fairness of the algorithm; 2) Different fairness problems--such as model-wide fairness issues versus case-specific fairness discrepancies--may be more effectively exposed through different styles of explanation; 3) Individual differences, including prior positions and judgment criteria of algorithmic fairness, impact how people react to different styles of explanation. We conclude with a discussion on providing personalized and adaptive explanations to support fairness judgments of ML systems.

HCNov 21, 2017
Toward Foraging for Understanding of StarCraft Agents: An Empirical Study

Sean Penney, Jonathan Dodge, Claudia Hilderbrand et al.

Assessing and understanding intelligent agents is a difficult task for users that lack an AI background. A relatively new area, called "Explainable AI," is emerging to help address this problem, but little is known about how users would forage through information an explanation system might offer. To inform the development of Explainable AI systems, we conducted a formative study, using the lens of Information Foraging Theory, into how experienced users foraged in the domain of StarCraft to assess an agent. Our results showed that participants faced difficult foraging problems. These foraging problems caused participants to entirely miss events that were important to them, reluctantly choose to ignore actions they did not want to ignore, and bear high cognitive, navigation, and information costs to access the information they needed.

HCNov 19, 2017
How the Experts Do It: Assessing and Explaining Agent Behaviors in Real-Time Strategy Games

Jonathan Dodge, Sean Penney, Claudia Hilderbrand et al.

How should an AI-based explanation system explain an agent's complex behavior to ordinary end users who have no background in AI? Answering this question is an active research area, for if an AI-based explanation system could effectively explain intelligent agents' behavior, it could enable the end users to understand, assess, and appropriately trust (or distrust) the agents attempting to help them. To provide insights into this question, we turned to human expert explainers in the real-time strategy domain, "shoutcaster", to understand (1) how they foraged in an evolving strategy game in real time, (2) how they assessed the players' behaviors, and (3) how they constructed pertinent and timely explanations out of their insights and delivered them to their audience. The results provided insights into shoutcasters' foraging strategies for gleaning information necessary to assess and explain the players; a characterization of the types of implicit questions shoutcasters answered; and implications for creating explanations by using the patterns

AIOct 31, 2017
Visualizing and Understanding Atari Agents

Sam Greydanus, Anurag Koul, Jonathan Dodge et al.

While deep reinforcement learning (deep RL) agents are effective at maximizing rewards, it is often unclear what strategies they use to do so. In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. In particular, we focus on using saliency maps to understand how an agent learns and executes a policy. We introduce a method for generating useful saliency maps and use it to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning. We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent's decisions and learning behavior.