CLMar 27, 2023
Causal schema induction for knowledge discoveryMichael Regan, Jena D. Hwang, Keisuke Sakaguchi et al. · uw
Making sense of familiar yet new situations typically involves making generalizations about causal schemas, stories that help humans reason about event sequences. Reasoning about events includes identifying cause and effect relations shared across event instances, a process we refer to as causal schema induction. Statistical schema induction systems may leverage structural knowledge encoded in discourse or the causal graphs associated with event meaning, however resources to study such causal structure are few in number and limited in size. In this work, we investigate how to apply schema induction models to the task of knowledge discovery for enhanced search of English-language news texts. To tackle the problem of data scarcity, we present Torquestra, a manually curated dataset of text-graph-schema units integrating temporal, event, and causal structures. We benchmark our dataset on three knowledge discovery tasks, building and evaluating models for each. Results show that systems that harness causal structure are effective at identifying texts sharing similar causal meaning components rather than relying on lexical cues alone. We make our dataset and models available for research purposes.
CLOct 20, 2022
Dense Paraphrasing for Textual EnrichmentJingxuan Tu, Kyeongmin Rim, Eben Holderness et al.
Understanding inferences and answering questions from text requires more than merely recovering surface arguments, adjuncts, or strings associated with the query terms. As humans, we interpret sentences as contextualized components of a narrative or discourse, by both filling in missing information, and reasoning about event consequences. In this paper, we define the process of rewriting a textual expression (lexeme or phrase) such that it reduces ambiguity while also making explicit the underlying semantics that is not (necessarily) expressed in the economy of sentence structure as Dense Paraphrasing (DP). We build the first complete DP dataset, provide the scope and design of the annotation task, and present results demonstrating how this DP process can enrich a source text to improve inferencing and QA task performance. The data and the source code will be publicly available.
CLApr 28
Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective AlignmentJames Pustejovsky, Nikhil Krishnaswamy
We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and normative risk. Unlike standard alignment methods that optimize surface-level preference or task utility, FPO treats clarification, verification, challenge, redirection, and refusal as explicit control actions whose purpose is to shape the evolution of belief, commitment, and uncertainty over time. We formalize alignment as a risk-sensitive epistemic control problem in which intervention decisions are selected based on their expected effect on downstream epistemic quality rather than on immediate reward alone. We introduce a compact taxonomy of frictive interventions, a structured friction functional that operationalizes multiple alignment failure modes, and a unified family of FPO methods spanning reward shaping, preference pairing, group-relative ranking, and risk-conditioned trust regions. We further propose an evaluation framework that measures epistemic competence directly through clarification behavior, calibration, contradiction repair, refusal proportionality, and information efficiency. Together, these results provide a formal and algorithmic foundation for learning agents that are aligned not only in outcome, but in epistemic conduct.
CLOct 5, 2016Code
ECAT: Event Capture Annotation ToolTuan Do, Nikhil Krishnaswamy, James Pustejovsky
This paper introduces the Event Capture Annotation Tool (ECAT), a user-friendly, open-source interface tool for annotating events and their participants in video, capable of extracting the 3D positions and orientations of objects in video captured by Microsoft's Kinect(R) hardware. The modeling language VoxML (Pustejovsky and Krishnaswamy, 2016) underlies ECAT's object, program, and attribute representations, although ECAT uses its own spec for explicit labeling of motion instances. The demonstration will show the tool's workflow and the options available for capturing event-participant relations and browsing visual data. Mapping ECAT's output to VoxML will also be addressed.
CLMar 26, 2024
Common Ground Tracking in Multimodal DialogueIbrahim Khebour, Kenneth Lai, Mariah Bradford et al.
Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on ``dialogue state tracking'' (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is ``common ground tracking'' (CGT), which identifies the shared belief space held by all of the participants in a task-oriented dialogue: the task-relevant propositions all participants accept as true. In this paper we present a method for automatically identifying the current set of shared beliefs and ``questions under discussion'' (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth, establishing a benchmark in this novel, challenging task.
CLApr 28
Toward a Functional Geometric Algebra for Natural Language SemanticsJames Pustejovsky
Distributional and neural approaches to natural language semantics have been built almost exclusively on conventional linear algebra: vectors, matrices, tensors, and the operations that accompany them. These methods have achieved remarkable empirical success, yet they face persistent structural limitations in compositional semantics, type sensitivity, and interpretability. I argue in this paper that geometric algebra (GA) -- specifically, Clifford algebras -- provides a mathematically superior foundation for semantic representation, and that a Functional Geometric Algebra (FGA) framework extends GA toward a typed, compositional semantics capable of supporting inference, transformation, and interpretability while retaining full compatibility with distributional learning and modern neural architectures. I develop the formal foundations, identify three core capabilities that GA provides and linear algebra does not, present a detailed worked example illustrating operator-level semantic contrasts, and show how GA-based operations already implicit in current transformer architectures can be made explicit and extended. The central claim is not merely increased dimensionality but increased structural organization: GA expands an $n$-dimensional embedding space into a $2^n$ multivector algebra where base semantic concepts and their higher-order interactions are represented within a single, principled algebraic framework.
CLMar 12, 2025
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative DialoguesHannah VanderHoeven, Brady Bhalla, Ibrahim Khebour et al.
We present TRACE, a novel system for live *common ground* tracking in situated collaborative tasks. With a focus on fast, real-time performance, TRACE tracks the speech, actions, gestures, and visual attention of participants, uses these multimodal inputs to determine the set of task-relevant propositions that have been raised as the dialogue progresses, and tracks the group's epistemic position and beliefs toward them as the task unfolds. Amid increased interest in AI systems that can mediate collaborations, TRACE represents an important step forward for agents that can engage with multiparty, multimodal discourse.
CLJun 12, 2025
Dynamic Epistemic Friction in DialogueTimothy Obiso, Kenneth Lai, Abhijnan Nath et al.
Recent developments in aligning Large Language Models (LLMs) with human preferences have significantly enhanced their utility in human-AI collaborative scenarios. However, such approaches often neglect the critical role of "epistemic friction," or the inherent resistance encountered when updating beliefs in response to new, conflicting, or ambiguous information. In this paper, we define dynamic epistemic friction as the resistance to epistemic integration, characterized by the misalignment between an agent's current belief state and new propositions supported by external evidence. We position this within the framework of Dynamic Epistemic Logic (Van Benthem and Pacuit, 2011), where friction emerges as nontrivial belief-revision during the interaction. We then present analyses from a situated collaborative task that demonstrate how this model of epistemic friction can effectively predict belief updates in dialogues, and we subsequently discuss how the model of belief alignment as a measure of epistemic resistance or friction can naturally be made more sophisticated to accommodate the complexities of real-world dialogue scenarios.
CLDec 8, 2024
Speech Is Not Enough: Interpreting Nonverbal Indicators of Common Knowledge and EngagementDerek Palmer, Yifan Zhu, Kenneth Lai et al.
Our goal is to develop an AI Partner that can provide support for group problem solving and social dynamics. In multi-party working group environments, multimodal analytics is crucial for identifying non-verbal interactions of group members. In conjunction with their verbal participation, this creates an holistic understanding of collaboration and engagement that provides necessary context for the AI Partner. In this demo, we illustrate our present capabilities at detecting and tracking nonverbal behavior in student task-oriented interactions in the classroom, and the implications for tracking common ground and engagement.
AIMar 5
Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic AsymmetryYifan Zhu, Mariah Bradford, Kenneth Lai et al.
Establishing common ground, a shared set of beliefs and mutually recognized facts, is fundamental to collaboration, yet remains a challenge for current AI systems, especially in multimodal, multiparty settings, where the collaborators bring different information to the table. We introduce the Distributed Partial Information Puzzle (DPIP), a collaborative construction task that elicits rich multimodal communication under epistemic asymmetry. We present a multimodal dataset of these interactions, annotated and temporally aligned across speech, gesture, and action modalities to support reasoning over propositional content and belief dynamics. We then evaluate two paradigms for modeling common ground (CG): (1) state-of-the-art large language models (LLMs), prompted to infer shared beliefs from multimodal updates, and (2) an axiomatic pipeline grounded in Dynamic Epistemic Logic (DEL) that incrementally performs the same task. Results on the annotated DPIP data indicate that it poses a challenge to modern LLMs' abilities to track both task progression and belief state.
CLMar 29, 2024
ChainNet: Structured Metaphor and Metonymy in WordNetRowan Hall Maudslay, Simone Teufel, Francis Bond et al.
The senses of a word exhibit rich internal structure. In a typical lexicon, this structure is overlooked: a word's senses are encoded as a list without inter-sense relations. We present ChainNet, a lexical resource which for the first time explicitly identifies these structures. ChainNet expresses how senses in the Open English Wordnet are derived from one another: every nominal sense of a word is either connected to another sense by metaphor or metonymy, or is disconnected in the case of homonymy. Because WordNet senses are linked to resources which capture information about their meaning, ChainNet represents the first dataset of grounded metaphor and metonymy.
CLJun 6, 2024
Linguistically Conditioned Semantic Textual SimilarityJingxuan Tu, Keer Xu, Liulu Yue et al.
Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.
CLMay 14, 2024
Computational Thought Experiments for a More Rigorous Philosophy and Science of the MindIris Oved, Nikhil Krishnaswamy, James Pustejovsky et al.
We offer philosophical motivations for a method we call Virtual World Cognitive Science (VW CogSci), in which researchers use virtual embodied agents that are embedded in virtual worlds to explore questions in the field of Cognitive Science. We focus on questions about mental and linguistic representation and the ways that such computational modeling can add rigor to philosophical thought experiments, as well as the terminology used in the scientific study of such representations. We find that this method forces researchers to take a god's-eye view when describing dynamical relationships between entities in minds and entities in an environment in a way that eliminates the need for problematic talk of belief and concept types, such as the belief that cats are silly, and the concept CAT, while preserving belief and concept tokens in individual cognizers' minds. We conclude with some further key advantages of VW CogSci for the scientific study of mental and linguistic representation and for Cognitive Science more broadly.
CLMay 22, 2023
An Abstract Specification of VoxML as an Annotation LanguageKiyong Lee, Nikhil Krishnaswamy, James Pustejovsky
VoxML is a modeling language used to map natural language expressions into real-time visualizations using commonsense semantic knowledge of objects and events. Its utility has been demonstrated in embodied simulation environments and in agent-object interactions in situated multimodal human-agent collaboration and communication. It introduces the notion of object affordance (both Gibsonian and Telic) from HRI and robotics, as well as the concept of habitat (an object's context of use) for interactions between a rational agent and an object. This paper aims to specify VoxML as an annotation language in general abstract terms. It then shows how it works on annotating linguistic data that express visually perceptible human-object interactions. The annotation structures thus generated will be interpreted against the enriched minimal model created by VoxML as a modeling language while supporting the modeling purposes of VoxML linguistically.
CLDec 14, 2021
Representing Inferences and their LexicalizationDavid McDonald, James Pustejovsky
We have recently begun a project to develop a more effective and efficient way to marshal inferences from background knowledge to facilitate deep natural language understanding. The meaning of a word is taken to be the entities, predications, presuppositions, and potential inferences that it adds to an ongoing situation. As words compose, the minimal model in the situation evolves to limit and direct inference. At this point we have developed our computational architecture and implemented it on real text. Our focus has been on proving the feasibility of our design.
CLMay 12, 2021
Designing Multimodal Datasets for NLP ChallengesJames Pustejovsky, Eben Holderness, Jingxuan Tu et al.
In this paper, we argue that the design and development of multimodal datasets for natural language processing (NLP) challenges should be enhanced in two significant respects: to more broadly represent commonsense semantic inferences; and to better reflect the dynamics of actions and events, through a substantive alignment of textual and visual information. We identify challenges and tasks that are reflective of linguistic and cognitive competencies that humans have when speaking and reasoning, rather than merely the performance of systems on isolated tasks. We introduce the distinction between challenge-based tasks and competence-based performance, and describe a diagnostic dataset, Recipe-to-Video Questions (R2VQ), designed for testing competence-based comprehension over a multimodal recipe collection (http://r2vq.org/). The corpus contains detailed annotation supporting such inferencing tasks and facilitating a rich set of question families that we use to evaluate NLP systems.
AIDec 5, 2020
Neurosymbolic AI for Situated Language UnderstandingNikhil Krishnaswamy, James Pustejovsky
In recent years, data-intensive AI, particularly the domain of natural language processing and understanding, has seen significant progress driven by the advent of large datasets and deep neural networks that have sidelined more classic AI approaches to the field. These systems can apparently demonstrate sophisticated linguistic understanding or generation capabilities, but often fail to transfer their skills to situations they have not encountered before. We argue that computational situated grounding provides a solution to some of these learning challenges by creating situational representations that both serve as a formal model of the salient phenomena, and contain rich amounts of exploitable, task-appropriate data for training new, flexible computational models. Our model reincorporates some ideas of classic AI into a framework of neurosymbolic intelligence, using multimodal contextual modeling of interactive situations, events, and object properties. We discuss how situated grounding provides diverse data and multiple levels of modeling for a variety of AI learning challenges, including learning how to interact with object affordances, learning semantics for novel structures and configurations, and transferring such learned knowledge to new objects and situations.
ROJul 13, 2020
Situated Multimodal Control of a Mobile Robot: Navigation through a Virtual EnvironmentKatherine Krajovic, Nikhil Krishnaswamy, Nathaniel J. Dimick et al.
We present a new interface for controlling a navigation robot in novel environments using coordinated gesture and language. We use a TurtleBot3 robot with a LIDAR and a camera, an embodied simulation of what the robot has encountered while exploring, and a cross-platform bridge facilitating generic communication. A human partner can deliver instructions to the robot using spoken English and gestures relative to the simulated environment, to guide the robot through navigation tasks.
CLJul 3, 2020
Exploration and Discovery of the COVID-19 Literature through Semantic VisualizationJingxuan Tu, Marc Verhagen, Brent Cochran et al.
We are developing semantic visualization techniques in order to enhance exploration and enable discovery over large datasets of complex networks of relations. Semantic visualization is a method of enabling exploration and discovery over large datasets of complex networks by exploiting the semantics of the relations in them. This involves (i) NLP to extract named entities, relations and knowledge graphs from the original data; (ii) indexing the output and creating representations for all relevant entities and relations that can be visualized in many different ways, e.g., as tag clouds, heat maps, graphs, etc.; (iii) applying parameter reduction operations to the extracted relations, creating "relation containers", or functional entities that can also be visualized using the same methods, allowing the visualization of multiple relations, partial pathways, and exploration across multiple dimensions. Our hope is that this will enable the discovery of novel inferences over relations in complex data that otherwise would go unnoticed. We have applied this to analysis of the recently released CORD-19 dataset.
CLJul 1, 2020
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report GenerationQingyun Wang, Manling Li, Xuan Wang et al.
To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relations, and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence.
CLMar 16, 2020
A Formal Analysis of Multimodal Referring Strategies Under Common GroundNikhil Krishnaswamy, James Pustejovsky
In this paper, we present an analysis of computationally generated mixed-modality definite referring expressions using combinations of gesture and linguistic descriptions. In doing so, we expose some striking formal semantic properties of the interactions between gesture and language, conditioned on the introduction of content into the common ground between the (computational) speaker and (human) viewer, and demonstrate how these formal features can contribute to training better models to predict viewer judgment of referring expressions, and potentially to the generation of more natural and informative referring expressions.
CLOct 9, 2019
Assessing the Efficacy of Clinical Sentiment Analysis and Topic Extraction in Psychiatric Readmission Risk PredictionElena Alvarez-Mellado, Eben Holderness, Nicholas Miller et al.
Predicting which patients are more likely to be readmitted to a hospital within 30 days after discharge is a valuable piece of information in clinical decision-making. Building a successful readmission risk classifier based on the content of Electronic Health Records (EHRs) has proved, however, to be a challenging task. Previously explored features include mainly structured information, such as sociodemographic data, comorbidity codes and physiological variables. In this paper we assess incorporating additional clinically interpretable NLP-based features such as topic extraction and clinical sentiment analysis to predict early readmission risk in psychiatry patients.
HCSep 18, 2019
Multimodal Continuation-style Architectures for Human-Robot InteractionNikhil Krishnaswamy, James Pustejovsky
We present an architecture for integrating real-time, multimodal input into a computational agent's contextual model. Using a human-avatar interaction in a virtual world, we treat aligned gesture and speech as an ensemble where content may be communicated by either modality. With a modified nondeterministic pushdown automaton architecture, the computer system: (1) consumes input incrementally using continuation-passing style until it achieves sufficient understanding the user's aim; (2) constructs and asks questions where necessary using established contextual information; and (3) maintains track of prior discourse items using multimodal cues. This type of architecture supports special cases of pushdown and finite state automata as well as integrating outputs from machine learning models. We present examples of this architecture's use in multimodal one-shot learning interactions of novel gestures and live action composition.
CLApr 5, 2019
Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health RecordsEben Holderness, Philip Cawkwell, Kirsten Bolton et al.
Recently natural language processing (NLP) tools have been developed to identify and extract salient risk indicators in electronic health records (EHRs). Sentiment analysis, although widely used in non-medical areas for improving decision making, has been studied minimally in the clinical setting. In this study, we undertook, to our knowledge, the first domain adaptation of sentiment analysis to psychiatric EHRs by defining psychiatric clinical sentiment, performing an annotation project, and evaluating multiple sentence-level sentiment machine learning (ML) models. Results indicate that off-the-shelf sentiment analysis tools fail in identifying clinically positive or negative polarity, and that the definition of clinical sentiment that we provide is learnable with relatively small amounts of training data. This project is an initial step towards further refining sentiment analysis methods for clinical use. Our long-term objective is to incorporate the results of this project as part of a machine learning model that predicts inpatient readmission risk. We hope that this work will initiate a discussion concerning domain adaptation of sentiment analysis to the clinical setting.
AIFeb 5, 2019
Situational Grounding within Multimodal SimulationsJames Pustejovsky, Nikhil Krishnaswamy
In this paper, we argue that simulation platforms enable a novel type of embodied spatial reasoning, one facilitated by a formal model of object and event semantics that renders the continuous quantitative search space of an open-world, real-time environment tractable. We provide examples for how a semantically-informed AI system can exploit the precise, numerical information provided by a game engine to perform qualitative reasoning about objects and events, facilitate learning novel concepts from data, and communicate with a human to improve its models and demonstrate its understanding. We argue that simulation environments, and game engines in particular, bring together many different notions of "simulation" and many different technologies to provide a highly-effective platform for developing both AI systems and tools to experiment in both machine and human intelligence.
AINov 27, 2018
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with NoiseNikhil Krishnaswamy, Scott Friedman, James Pustejovsky
Many modern machine learning approaches require vast amounts of training data to learn new concepts; conversely, human learning often requires few examples--sometimes only one--from which the learner can abstract structural concepts. We present a novel approach to introducing new spatial structures to an AI agent, combining deep learning over qualitative spatial relations with various heuristic search algorithms. The agent extracts spatial relations from a sparse set of noisy examples of block-based structures, and trains convolutional and sequential models of those relation sets. To create novel examples of similar structures, the agent begins placing blocks on a virtual table, uses a CNN to predict the most similar complete example structure after each placement, an LSTM to predict the most likely set of remaining moves needed to complete it, and recommends one using heuristic search. We verify that the agent learned the concept by observing its virtual block-building activities, wherein it ranks each potential subsequent action toward building its learned concept. We empirically assess this approach with human participants' ratings of the block structures. Initial results and qualitative evaluations of structures generated by the trained agent show where it has generalized concepts from the training data, which heuristics perform best within the search space, and how we might improve learning and execution.
ROOct 1, 2018
Multimodal Interactive Learning of Primitive ActionsTuan Do, Nikhil Krishnaswamy, Kyeongmin Rim et al.
We describe an ongoing project in learning to perform primitive actions from demonstrations using an interactive interface. In our previous work, we have used demonstrations captured from humans performing actions as training samples for a neural network-based trajectory model of actions to be performed by a computational agent in novel setups. We found that our original framework had some limitations that we hope to overcome by incorporating communication between the human and the computational agent, using the interaction between them to fine-tune the model learned by the machine. We propose a framework that uses multimodal human-computer interaction to teach action concepts to machines, making use of both live demonstration and communication through natural language, as two distinct teaching modalities, while requiring few training samples.
CLSep 15, 2018
Analysis of Risk Factor Domains in Psychosis Patient Health RecordsEben Holderness, Nicholas Miller, Philip Cawkwell et al.
Readmission after discharge from a hospital is disruptive and costly, regardless of the reason. However, it can be particularly problematic for psychiatric patients, so predicting which patients may be readmitted is critically important but also very difficult. Clinical narratives in psychiatric electronic health records (EHRs) span a wide range of topics and vocabulary; therefore, a psychiatric readmission prediction model must begin with a robust and interpretable topic extraction component. We created a data pipeline for using document vector similarity metrics to perform topic extraction on psychiatric EHR data in service of our long-term goal of creating a readmission risk classifier. We show initial results for our topic extraction model and identify additional features we will be incorporating in the future.
CVOct 2, 2017
Learning event representation: As sparse as possible, but not sparserTuan Do, James Pustejovsky
Selecting an optimal event representation is essential for event classification in real world contexts. In this paper, we investigate the application of qualitative spatial reasoning (QSR) frameworks for classification of human-object interaction in three dimensional space, in comparison with the use of quantitative feature extraction approaches for the same purpose. In particular, we modify QSRLib, a library that allows computation of Qualitative Spatial Relations and Calculi, and employ it for feature extraction, before inputting features into our neural network models. Using an experimental setup involving motion captures of human-object interaction as three dimensional inputs, we observe that the use of qualitative spatial features significantly improves the performance of our machine learning algorithm against our baseline, while quantitative features of similar kinds fail to deliver similar improvement. We also observe that sequential representations of QSR features yield the best classification performance. A result of our learning method is a simple approach to the qualitative representation of 3D activities as compositions of 2D actions that can be visualized and learned using 2-dimensional QSR.
CVSep 30, 2017
Fine-grained Event Learning of Human-Object Interaction with LSTM-CRFTuan Do, James Pustejovsky
Event learning is one of the most important problems in AI. However, notwithstanding significant research efforts, it is still a very complex task, especially when the events involve the interaction of humans or agents with other objects, as it requires modeling human kinematics and object movements. This study proposes a methodology for learning complex human-object interaction (HOI) events, involving the recording, annotation and classification of event interactions. For annotation, we allow multiple interpretations of a motion capture by slicing over its temporal span, for classification, we use Long-Short Term Memory (LSTM) sequential models with Conditional Randon Field (CRF) for constraints of outputs. Using a setup involving captures of human-object interaction as three dimensional inputs, we argue that this approach could be used for event types involving complex spatio-temporal dynamics.
CLOct 6, 2016
Generating Simulations of Motion Events from Verbal DescriptionsJames Pustejovsky, Nikhil Krishnaswamy
In this paper, we describe a computational model for motion events in natural language that maps from linguistic expressions, through a dynamic event interpretation, into three-dimensional temporal simulations in a model. Starting with the model from (Pustejovsky and Moszkowicz, 2011), we analyze motion events using temporally-traced Labelled Transition Systems. We model the distinction between path- and manner-motion in an operational semantics, and further distinguish different types of manner-of-motion verbs in terms of the mereo-topological relations that hold throughout the process of movement. From these representations, we generate minimal models, which are realized as three-dimensional simulations in software developed with the game engine, Unity. The generated simulations act as a conceptual "debugger" for the semantics of different motion verbs: that is, by testing for consistency and informativeness in the model, simulations expose the presuppositions associated with linguistic expressions and their compositions. Because the model generation component is still incomplete, this paper focuses on an implementation which maps directly from linguistic interpretations into the Unity code snippets that create the simulations.
CLOct 5, 2016
VoxML: A Visualization Modeling LanguageJames Pustejovsky, Nikhil Krishnaswamy
We present the specification for a modeling language, VoxML, which encodes semantic knowledge of real-world objects represented as three-dimensional models, and of events and attributes related to and enacted over these objects. VoxML is intended to overcome the limitations of existing 3D visual markup languages by allowing for the encoding of a broad range of semantic knowledge that can be exploited by a variety of systems and platforms, leading to multimodal simulations of real-world scenarios using conceptual objects that represent their semantic values.
CLOct 3, 2016
Multimodal Semantic Simulations of Linguistically Underspecified Motion EventsNikhil Krishnaswamy, James Pustejovsky
In this paper, we describe a system for generating three-dimensional visual simulations of natural language motion expressions. We use a rich formal model of events and their participants to generate simulations that satisfy the minimal constraints entailed by the associated utterance, relying on semantic knowledge of physical objects and motion events. This paper outlines technical considerations and discusses implementing the aforementioned semantic models into such a system.
CVJul 10, 2016
Annotation Methodologies for Vision and Language Dataset CreationGitit Kehat, James Pustejovsky
Annotated datasets are commonly used in the training and evaluation of tasks involving natural language and vision (image description generation, action recognition and visual question answering). However, many of the existing datasets reflect problems that emerge in the process of data selection and annotation. Here we point out some of the difficulties and problems one confronts when creating and validating annotated vision and language datasets.
CLMar 19, 2014
Clinical TempEvalSteven Bethard, Leon Derczynski, James Pustejovsky et al.
We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 evaluation exercise. This task involves identifying and describing events, times and the relations between them in clinical text. Six discrete subtasks are included, focusing on recognising mentions of times and events, describing those mentions for both entity types, identifying the relation between an event and the document creation time, and identifying narrative container relations.
CLJun 22, 2012
TempEval-3: Evaluating Events, Time Expressions, and Temporal RelationsNaushad UzZaman, Hector Llorens, James Allen et al.
We describe the TempEval-3 task which is currently in preparation for the SemEval-2013 evaluation exercise. The aim of TempEval is to advance research on temporal information processing. TempEval-3 follows on from previous TempEval events, incorporating: a three-part task structure covering event, temporal expression and temporal relation extraction; a larger dataset; and single overall task quality scores.