3.7CVJun 4
Visual Commonsense Driven Knowledge Refinements for Scene Graph GenerationMaëlic Neau, Salim Baloch, Jakob Suchan et al.
Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model-agnostic, semantically-guided knowledge refinement framework that systematically mines commonsense-grounded constraints from training data - capturing spatial, functional, and qualitative relational regularities - and uses general declarative commonsense reasoning to correct and refine ranked SGG predictions at inference time. The framework requires no manual rule authoring, no model retraining, and transfers across datasets and architectures. On three standard benchmarks, we obtain consistent improvements over strong baselines, demonstrating that structured visual commonsense reasoning over deep scene semantics is a practical and effective complement to purely learning-based scene graph generation.
AIJun 22, 2013Code
Cognitive Interpretation of Everyday Activities: Toward Perceptual Narrative Based Visuo-Spatial Scene InterpretationMehul Bhatt, Jakob Suchan, Carl Schultz
We position a narrative-centred computational model for high-level knowledge representation and reasoning in the context of a range of assistive technologies concerned with "visuo-spatial perception and cognition" tasks. Our proposed narrative model encompasses aspects such as \emph{space, events, actions, change, and interaction} from the viewpoint of commonsense reasoning and learning in large-scale cognitive systems. The broad focus of this paper is on the domain of "human-activity interpretation" in smart environments, ambient intelligence etc. In the backdrop of a "smart meeting cinematography" domain, we position the proposed narrative model, preliminary work on perceptual narrativisation, and the immediate outlook on constructing general-purpose open-source tools for perceptual narrativisation. ACM Classification: I.2 Artificial Intelligence: I.2.0 General -- Cognitive Simulation, I.2.4 Knowledge Representation Formalisms and Methods, I.2.10 Vision and Scene Understanding: Architecture and control structures, Motion, Perceptual reasoning, Shape, Video analysis General keywords: cognitive systems; human-computer interaction; spatial cognition and computation; commonsense reasoning; spatial and temporal reasoning; assistive technologies
AIJun 19, 2025
A Community-driven vision for a new Knowledge Resource for AIVinay K Chaudhri, Chaitan Baru, Brandon Bennett et al.
The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language models struggle due to knowledge gaps; robotic planning lacks necessary world knowledge; and the detection of factually false information relies heavily on human expertise. What kind of knowledge resource is most needed in AI today? How can modern technology shape its development and evaluation? A recent AAAI workshop gathered over 50 researchers to explore these questions. This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure. In addition to leveraging contemporary advances in knowledge representation and reasoning, one promising idea is to build an open engineering framework to exploit knowledge modules effectively within the context of practical applications. Such a framework should include sets of conventions and social structures that are adopted by contributors.
AIDec 28, 2020
Commonsense Visual Sensemaking for Autonomous Driving: On Generalised Neurosymbolic Online Abduction Integrating Vision and SemanticsJakob Suchan, Mehul Bhatt, Srikrishna Varadarajan
We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking -- e.g., involving semantic representation and explainability, question-answering, commonsense interpolation -- in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations. Keywords: Cognitive Vision, Deep Semantics, Declarative Spatial Reasoning, Knowledge Representation and Reasoning, Commonsense Reasoning, Visual Abduction, Answer Set Programming, Autonomous Driving, Human-Centred Computing and Design, Standardisation in Driving Technology, Spatial Cognition and AI.
CYDec 19, 2020
Visuo-Locomotive Complexity as a Component of Parametric Systems for Architecture DesignVasiliki Kondyli, Mehul Bhatt, Evgenia Spyridonos
A people-centred approach for designing large-scale built-up spaces necessitates systematic anticipation of user's embodied visuo-locomotive experience from the viewpoint of human-environment interaction factors pertaining to aspects such as navigation, wayfinding, usability. In this context, we develop a behaviour-based visuo-locomotive complexity model that functions as a key correlate of cognitive performance vis-a-vis internal navigation in built-up spaces. We also demonstrate the model's implementation and application as a parametric tool for the identification and manipulation of the architectural morphology along a navigation path as per the parameters of the proposed visuospatial complexity model. We present examples based on an empirical study in two healthcare buildings, and showcase the manner in which a dynamic and interactive parametric (complexity) model can promote behaviour-based decision-making throughout the design process to maintain desired levels of visuospatial complexity as part of a navigation or wayfinding experience.
AIMay 29, 2020
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday DrivingVasiliki Kondyli, Mehul Bhatt, Jakob Suchan
We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaining to embodied visuoauditory attention. We report preliminary steps to apply the developed cognitive model of visuospatial complexity for human-factors guided dataset creation and benchmarking, and for its use as a semantic template for the (explainable) computational analysis of visuospatial complexity.
AIMay 31, 2019
Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous DrivingJakob Suchan, Mehul Bhatt, Srikrishna Varadarajan
We demonstrate the need and potential of systematically integrated vision and semantics} solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e.g., semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations.
CVMay 31, 2018
Semantic Analysis of (Reflectional) Visual Symmetry: A Human-Centred Computational Model for Declarative ExplainabilityJakob Suchan, Mehul Bhatt, Srikrishna Vardarajan et al.
We present a computational model for the semantic interpretation of symmetry in naturalistic scenes. Key features include a human-centred representation, and a declarative, explainable interpretation model supporting deep semantic question-answering founded on an integration of methods in knowledge representation and deep learning based computer vision. In the backdrop of the visual arts, we showcase the framework's capability to generate human-centred, queryable, relational structures, also evaluating the framework with an empirical study on the human perception of visual symmetry. Our framework represents and is driven by the application of foundational, integrated Vision and Knowledge Representation and Reasoning methods for applications in the arts, and the psychological and social sciences.
AIMay 17, 2018
Answer Set Programming Modulo `Space-Time'Carl Schultz, Mehul Bhatt, Jakob Suchan et al.
We present ASP Modulo `Space-Time', a declarative representational and computational framework to perform commonsense reasoning about regions with both spatial and temporal components. Supported are capabilities for mixed qualitative-quantitative reasoning, consistency checking, and inferring compositions of space-time relations; these capabilities combine and synergise for applications in a range of AI application areas where the processing and interpretation of spatio-temporal data is crucial. The framework and resulting system is the only general KR-based method for declaratively reasoning about the dynamics of `space-time' regions as first-class objects. We present an empirical evaluation (with scalability and robustness results), and include diverse application examples involving interpretation and control tasks.
AIDec 3, 2017
Visual Explanation by High-Level Abduction: On Answer-Set Programming Driven Reasoning about Moving ObjectsJakob Suchan, Mehul Bhatt, Przemysław Wałęga et al.
We propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with space-time tracklets as native entities; and (2) a visual processing pipeline for detection based object tracking and motion analysis. We present the formal framework, its general implementation as a (declarative) method in answer set programming, and an example application and evaluation based on two diverse video datasets: the MOTChallenge benchmark developed by the vision community, and a recently developed Movie Dataset.
ROOct 10, 2017
Deep Semantic Abstractions of Everyday Human Activities: On Commonsense Representations of Human InteractionsJakob Suchan, Mehul Bhatt
We propose a deep semantic characterization of space and motion categorically from the viewpoint of grounding embodied human-object interactions. Our key focus is on an ontological model that would be adept to formalisation from the viewpoint of commonsense knowledge representation, relational learning, and qualitative reasoning about space and motion in cognitive robotics settings. We demonstrate key aspects of the space & motion ontology and its formalization as a representational framework in the backdrop of select examples from a dataset of everyday activities. Furthermore, focussing on human-object interaction data obtained from RGBD sensors, we also illustrate how declarative (spatio-temporal) reasoning in the (constraint) logic programming family may be performed with the developed deep semantic abstractions.
ROSep 15, 2017
Commonsense Scene Semantics for Cognitive Robotics: Towards Grounding Embodied Visuo-Locomotive InteractionsJakob Suchan, Mehul Bhatt
We present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions, and indoor movement.
AIAug 9, 2016
Deeply Semantic Inductive Spatio-Temporal LearningJakob Suchan, Mehul Bhatt, Carl Schultz
We present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an example application in the domain of computing for visual arts and computational cognitive science.
CLJul 26, 2016
Grounding Dynamic Spatial Relations for Embodied (Robot) InteractionMichael Spranger, Jakob Suchan, Mehul Bhatt et al.
This paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic spatial phrases (in English) given an environment of moving objects. The model unites two separate research strands: computational cognitive semantics and on commonsense spatial representation and reasoning. The model for the first time demonstrates an integration of these different strands.
AIJul 20, 2016
Robust Natural Language Processing - Combining Reasoning, Cognitive Semantics and Construction Grammar for Spatial LanguageMichael Spranger, Jakob Suchan, Mehul Bhatt
We present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as "across" and "in front of". We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omissions and ungrammatical utterances.
AIJun 25, 2016
Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo TheoriesPrzemysław Andrzej Wałęga, Carl Schultz, Mehul Bhatt
The systematic modelling of dynamic spatial systems is a key requirement in a wide range of application areas such as commonsense cognitive robotics, computer-aided architecture design, and dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning -a crucial requirement within dynamic spatial systems- based on Answer Set Programming Modulo Theories (ASPMT). ASPMT(QS) consists of a (qualitative) spatial representation module (QS) and a method for turning tight ASPMT instances into Satisfiability Modulo Theories (SMT) instances in order to compute stable models by means of SMT solvers. We formalise and implement concepts of default spatial reasoning and spatial frame axioms. Spatial reasoning is performed by encoding spatial relations as systems of polynomial constraints, and solving via SMT with the theory of real nonlinear arithmetic. We empirically evaluate ASPMT(QS) in comparison with other contemporary spatial reasoning systems both within and outside the context of logic programming. ASPMT(QS) is currently the only existing system that is capable of reasoning about indirect spatial effects (i.e., addressing the ramification problem), and integrating geometric and qualitative spatial information within a non-monotonic spatial reasoning context. This paper is under consideration for publication in TPLP.
AIAug 13, 2015
Talking about the Moving Image: A Declarative Model for Image Schema Based Embodied Perception Grounding and Language GenerationJakob Suchan, Mehul Bhatt, Harshita Jhavar
We present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic abstractions, image schemas, and a spatio-temporal feature based language generator--- is modularly implemented within Constraint Logic Programming (CLP). The implemented model is such that primitives of the theory, e.g., pertaining to space and motion, image schemata, are available as first-class objects with `deep semantics' suited for inference and query. We demonstrate the model with select examples broadly motivated by areas such as film, design, geography, smart environments where analytical natural language based externalisations of the moving image are central from the viewpoint of human interaction, evidence-based qualitative analysis, and sensemaking. Keywords: moving image, visual semantics and embodiment, visuo-spatial cognition and computation, cognitive vision, computational models of narrative, declarative spatial reasoning
AIJun 16, 2015
Spatial Symmetry Driven Pruning Strategies for Efficient Declarative Spatial ReasoningCarl Schultz, Mehul Bhatt
Declarative spatial reasoning denotes the ability to (declaratively) specify and solve real-world problems related to geometric and qualitative spatial representation and reasoning within standard knowledge representation and reasoning (KR) based methods (e.g., logic programming and derivatives). One approach for encoding the semantics of spatial relations within a declarative programming framework is by systems of polynomial constraints. However, solving such constraints is computationally intractable in general (i.e. the theory of real-closed fields). We present a new algorithm, implemented within the declarative spatial reasoning system CLP(QS), that drastically improves the performance of deciding the consistency of spatial constraint graphs over conventional polynomial encodings. We develop pruning strategies founded on spatial symmetries that form equivalence classes (based on affine transformations) at the qualitative spatial level. Moreover, pruning strategies are themselves formalised as knowledge about the properties of space and spatial symmetries. We evaluate our algorithm using a range of benchmarks in the class of contact problems, and proofs in mereology and geometry. The empirical results show that CLP(QS) with knowledge-based spatial pruning outperforms conventional polynomial encodings by orders of magnitude, and can thus be applied to problems that are otherwise unsolvable in practice.
AIJun 16, 2015
ASPMT(QS): Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo TheoriesPrzemysław Andrzej Wałęga, Mehul Bhatt, Carl Schultz
The systematic modelling of \emph{dynamic spatial systems} [9] is a key requirement in a wide range of application areas such as comonsense cognitive robotics, computer-aided architecture design, dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning ---a crucial requirement within dynamic spatial systems-- based on Answer Set Programming Modulo Theories (ASPMT). ASPMT(QS) consists of a (qualitative) spatial representation module (QS) and a method for turning tight ASPMT instances into Sat Modulo Theories (SMT) instances in order to compute stable models by means of SMT solvers. We formalise and implement concepts of default spatial reasoning and spatial frame axioms using choice formulas. Spatial reasoning is performed by encoding spatial relations as systems of polynomial constraints, and solving via SMT with the theory of real nonlinear arithmetic. We empirically evaluate ASPMT(QS) in comparison with other prominent contemporary spatial reasoning systems. Our results show that ASPMT(QS) is the only existing system that is capable of reasoning about indirect spatial effects (i.e. addressing the ramification problem), and integrating geometric and qualitative spatial information within a non-monotonic spatial reasoning context.
AIJul 11, 2013
Between Sense and Sensibility: Declarative narrativisation of mental models as a basis and benchmark for visuo-spatial cognition and computation focussed collaborative cognitive systemsMehul Bhatt
What lies between `\emph{sensing}' and `\emph{sensibility}'? In other words, what kind of cognitive processes mediate sensing capability, and the formation of sensible impressions ---e.g., abstractions, analogies, hypotheses and theory formation, beliefs and their revision, argument formation--- in domain-specific problem solving, or in regular activities of everyday living, working and simply going around in the environment? How can knowledge and reasoning about such capabilities, as exhibited by humans in particular problem contexts, be used as a model and benchmark for the development of collaborative cognitive (interaction) systems concerned with human assistance, assurance, and empowerment? We pose these questions in the context of a range of assistive technologies concerned with \emph{visuo-spatial perception and cognition} tasks encompassing aspects such as commonsense, creativity, and the application of specialist domain knowledge and problem-solving thought processes. Assistive technologies being considered include: (a) human activity interpretation; (b) high-level cognitive rovotics; (c) people-centred creative design in domains such as architecture & digital media creation, and (d) qualitative analyses geographic information systems. Computational narratives not only provide a rich cognitive basis, but they also serve as a benchmark of functional performance in our development of computational cognitive assistance systems. We posit that computational narrativisation pertaining to space, actions, and change provides a useful model of \emph{visual} and \emph{spatio-temporal thinking} within a wide-range of problem-solving tasks and application areas where collaborative cognitive systems could serve an assistive and empowering function.
AIJul 9, 2013
Geospatial Narratives and their Spatio-Temporal Dynamics: Commonsense Reasoning for High-level Analyses in Geographic Information SystemsMehul Bhatt, Jan Oliver Wallgruen
The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings. We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science. Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems
AIJun 5, 2013
ROTUNDE - A Smart Meeting Cinematography Initiative: Tools, Datasets, and Benchmarks for Cognitive Interpretation and ControlMehul Bhatt, Jakob Suchan, Christian Freksa
We construe smart meeting cinematography with a focus on professional situations such as meetings and seminars, possibly conducted in a distributed manner across socio-spatially separated groups. The basic objective in smart meeting cinematography is to interpret professional interactions involving people, and automatically produce dynamic recordings of discussions, debates, presentations etc in the presence of multiple communication modalities. Typical modalities include gestures (e.g., raising one's hand for a question, applause), voice and interruption, electronic apparatus (e.g., pressing a button), movement (e.g., standing-up, moving around) etc. ROTUNDE, an instance of smart meeting cinematography concept, aims to: (a) develop functionality-driven benchmarks with respect to the interpretation and control capabilities of human-cinematographers, real-time video editors, surveillance personnel, and typical human performance in everyday situations; (b) Develop general tools for the commonsense cognitive interpretation of dynamic scenes from the viewpoint of visuo-spatial cognition centred perceptual narrativisation. Particular emphasis is placed on declarative representations and interfacing mechanisms that seamlessly integrate within large-scale cognitive (interaction) systems and companion technologies consisting of diverse AI sub-components. For instance, the envisaged tools would provide general capabilities for high-level commonsense reasoning about space, events, actions, change, and interaction.
AIJun 4, 2013
Narrative based Postdictive Reasoning for Cognitive RoboticsManfred Eppe, Mehul Bhatt
Making sense of incomplete and conflicting narrative knowledge in the presence of abnormalities, unobservable processes, and other real world considerations is a challenge and crucial requirement for cognitive robotics systems. An added challenge, even when suitably specialised action languages and reasoning systems exist, is practical integration and application within large-scale robot control frameworks. In the backdrop of an autonomous wheelchair robot control task, we report on application-driven work to realise postdiction triggered abnormality detection and re-planning for real-time robot control: (a) Narrative-based knowledge about the environment is obtained via a larger smart environment framework; and (b) abnormalities are postdicted from stable-models of an answer-set program corresponding to the robot's epistemic model. The overall reasoning is performed in the context of an approximate epistemic action theory based planner implemented via a translation to answer-set programming.
AIApr 17, 2013
h-approximation: History-Based Approximation of Possible World Semantics as ASPManfred Eppe, Mehul Bhatt, Frank Dylla
We propose an approximation of the Possible Worlds Semantics (PWS) for action planning. A corresponding planning system is implemented by a transformation of the action specification to an Answer-Set Program. A novelty is support for postdiction wrt. (a) the plan existence problem in our framework can be solved in NP, as compared to $Σ_2^P$ for non-approximated PWS of Baral(2000); and (b) the planner generates optimal plans wrt. a minimal number of actions in $Δ_2^P$. We demo the planning system with standard problems, and illustrate its integration in a larger software framework for robot control in a smart home.