Ravenna Thielstrom

h-index6

5papers

2,018citations

Novelty30%

AI Score29

Ranked #143,703 of 194,257 authors (top 74%)#25,543 in CL (top 83%)

5 Papers

11.8CVApr 22, 2025

Vision language models are unreliable at trivial spatial cognition

Sangeet Khemlani, Tyler Tran, Nathaniel Gyory et al.

Vision language models (VLMs) are designed to extract relevant visuospatial information from images. Some research suggests that VLMs can exhibit humanlike scene understanding, while other investigations reveal difficulties in their ability to process relational information. To achieve widespread applicability, VLMs must perform reliably, yielding comparable competence across a wide variety of related tasks. We sought to test how reliable these architectures are at engaging in trivial spatial cognition, e.g., recognizing whether one object is left of another in an uncluttered scene. We developed a benchmark dataset -- TableTest -- whose images depict 3D scenes of objects arranged on a table, and used it to evaluate state-of-the-art VLMs. Results show that performance could be degraded by minor variations of prompts that use logically equivalent descriptions. These analyses suggest limitations in how VLMs may reason about spatial relations in real-world applications. They also reveal novel opportunities for bolstering image caption corpora for more efficient training and testing.

4.1ROMay 4, 2020

"Can you do this?" Self-Assessment Dialogues with Autonomous Robots Before, During, and After a Mission

Tyler Frasca, Evan Krause, Ravenna Thielstrom et al.

Autonomous robots with sophisticated capabilities can make it difficult for human instructors to assess its capabilities and proficiencies. Therefore, it is important future robots have the ability to: introspect on their capabilities and assess their task performance. Introspection allows the robot to determine what it can accomplish and self-assessment allows the robot estimate the likelihood it will accomplish at given task. We introduce a general framework for introspection and self-assessment that enables robots to have task and performance-based dialogues before, during, and after a mission. We then realize aspects of the framework in the cognitive robotic DIARC architecture, and finally show a proof-of-concept demonstration on a Nao robot showing its self-assessment capabilities before, during, and after an instructed task.

30.0CLNov 1, 2019

Engaging in Dialogue about an Agent's Norms and Behaviors

Daniel Kasenberg, Antonio Roque, Ravenna Thielstrom et al.

We present a set of capabilities allowing an agent planning with moral and social norms represented in temporal logic to respond to queries about its norms and behaviors in natural language, and for the human user to add and remove norms directly in natural language. The user may also pose hypothetical modifications to the agent's norms and inquire about their effects.

30.1CLNov 1, 2019

Generating Justifications for Norm-Related Agent Decisions

Daniel Kasenberg, Antonio Roque, Ravenna Thielstrom et al.

We present an approach to generating natural language justifications of decisions derived from norm-based reasoning. Assuming an agent which maximally satisfies a set of rules specified in an object-oriented temporal logic, the user can ask factual questions (about the agent's rules, actions, and the extent to which the agent violated the rules) as well as "why" questions that require the agent comparing actual behavior to counterfactual trajectories with respect to these rules. To produce natural-sounding explanations, we focus on the subproblem of producing natural language clauses from statements in a fragment of temporal logic, and then describe how to embed these clauses into explanatory sentences. We use a human judgment evaluation on a testbed task to compare our approach to variants in terms of intelligibility, mental model and perceived trust.

2.9RONov 26, 2018

Augmenting Robot Knowledge Consultants with Distributed Short Term Memory

Tom Williams, Ravenna Thielstrom, Evan Krause et al.

Human-robot communication in situated environments involves a complex interplay between knowledge representations across a wide variety of modalities. Crucially, linguistic information must be associated with representations of objects, locations, people, and goals, which may be represented in very different ways. In previous work, we developed a Consultant Framework that facilitates modality-agnostic access to information distributed across a set of heterogeneously represented knowledge sources. In this work, we draw inspiration from cognitive science to augment these distributed knowledge sources with Short Term Memory Buffers to create an STM-augmented algorithm for referring expression generation. We then discuss the potential performance benefits of this approach and insights from cognitive science that may inform future refinements in the design of our approach.