75.2CLMay 21Code
When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith GuidanceBrett Israelsen, Sheryl Carty, Josh Coates et al.
We ask whether large language models (LLMs) treat queries about religious conversion symmetrically. The answer is no. When asked for advice on hypothetical faith transitions from one religion to another, then asked the reversed question, models exhibited consistent asymmetries, favoring some religions while subtly discouraging conversion to others. On average Catholic, Bahá'í, and Sikh religions were broadly favored (high support for joining, low support for leaving), while Atheists, Agnostics, and Jehovah's Witnesses were primarily disfavored. Patterns varied by model size and model provider, with Grok 4.20 exhibiting the strongest asymmetries. We tested 20 commercial and open-source language models across 182 religion pairings using a human-verified LLM-as-a-judge framework. Each model was probed via interactions with a simulated user asking for advice on a potential faith conversion. Models tended to use more encouraging language for some faith transitions over others; these patterns were systematically repeatable across multiple trials. All LLMs tested exhibited reproducible asymmetry, though the pattern of preferences differed for each. Overall preferences persist across multiple question phrasings and variations in the religious pairing dataset. Taken together, these results suggest that asymmetry is a robust property of model behavior rather than an artifact of how the models' answers were scored. It is important to consider that any imbalances deployed and reproduced en masse can have real-world implications.
CLDec 5, 2023
LLMs for Multi-Modal Knowledge Extraction and Analysis in Intelligence/Safety-Critical ApplicationsBrett Israelsen, Soumalya Sarkar
Large Language Models have seen rapid progress in capability in recent years; this progress has been accelerating and their capabilities, measured by various benchmarks, are beginning to approach those of humans. There is a strong demand to use such models in a wide variety of applications but, due to unresolved vulnerabilities and limitations, great care needs to be used before applying them to intelligence and safety-critical applications. This paper reviews recent literature related to LLM assessment and vulnerabilities to synthesize the current research landscape and to help understand what advances are most critical to enable use of of these technologies in intelligence and safety-critical applications. The vulnerabilities are broken down into ten high-level categories and overlaid onto a high-level life cycle of an LLM. Some general categories of mitigations are reviewed.
HCMay 4, 2021
Evaluating Metrics for Standardized Benchmarking of Remote Presence SystemsCharles Peasley, Rachel Dianiska, Emily Oldham et al.
To reduce the need for business-related air travel and its associated energy consumption and carbon footprint, the U.S. Department of Energy's ARPA-E is supporting a research project called SCOTTIE - Systematic Communication Objectives and Telecommunications Technology Investigations and Evaluations. SCOTTIE tests virtual and augmented reality platforms in a functional comparison with face-to-face (FtF) interactions to derive travel replacement thresholds for common industrial training scenarios. The primary goal of Study 1 is to match the communication effectiveness and learning outcomes obtained from a FtF control using virtual reality (VR) training scenarios in which a local expert with physical equipment trains a remote apprentice without physical equipment immediately present. This application scenario is commonplace in industrial settings where access to expensive equipment and materials is limited and a number of apprentices must travel to a central location in order to undergo training. Supplying an empirically validated virtual training alternative constitutes a readily adoptable use-case for businesses looking to reduce time and monetary expenditures associated with travel. The technology used for three different virtual presence technologies was strategically selected for feasibility, relatively low cost, business relevance, and potential for impact through transition. The authors suggest that the results of this study might generalize to the challenge of virtual conferences.
MLDec 13, 2016
Hybrid Repeat/Multi-point Sampling for Highly Volatile Objective FunctionsBrett Israelsen, Nisar Ahmed
A key drawback of the current generation of artificial decision-makers is that they do not adapt well to changes in unexpected situations. This paper addresses the situation in which an AI for aerial dog fighting, with tunable parameters that govern its behavior, will optimize behavior with respect to an objective function that must be evaluated and learned through simulations. Once this objective function has been modeled, the agent can then choose its desired behavior in different situations. Bayesian optimization with a Gaussian Process surrogate is used as the method for investigating the objective function. One key benefit is that during optimization the Gaussian Process learns a global estimate of the true objective function, with predicted outcomes and a statistical measure of confidence in areas that haven't been investigated yet. However, standard Bayesian optimization does not perform consistently or provide an accurate Gaussian Process surrogate function for highly volatile objective functions. We treat these problems by introducing a novel sampling technique called Hybrid Repeat/Multi-point Sampling. This technique gives the AI ability to learn optimum behaviors in a highly uncertain environment. More importantly, it not only improves the reliability of the optimization, but also creates a better model of the entire objective surface. With this improved model the agent is equipped to better adapt behaviors.