RONov 6, 2025
GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA PoliciesMaëlic Neau, Zoe Falomir, Paulo E. Santos et al.
Deploying autonomous robots that can learn new skills from demonstrations is an important challenge of modern robotics. Existing solutions often apply end-to-end imitation learning with Vision-Language Action (VLA) models or symbolic approaches with Action Model Learning (AML). On the one hand, current VLA models are limited by the lack of high-level symbolic planning, which hinders their abilities in long-horizon tasks. On the other hand, symbolic approaches in AML lack generalization and scalability perspectives. In this paper we present a new neuro-symbolic approach, GraSP-VLA, a framework that uses a Continuous Scene Graph representation to generate a symbolic representation of human demonstrations. This representation is used to generate new planning domains during inference and serves as an orchestrator for low-level VLA policies, scaling up the number of actions that can be reproduced in a row. Our results show that GraSP-VLA is effective for modeling symbolic representations on the task of automatic planning domain generation from observations. In addition, results on real-world experiments show the potential of our Continuous Scene Graph representation to orchestrate low-level VLA policies in long-horizon tasks.
CVMar 6Code
REACT++: Efficient Cross-Attention for Real-Time Scene Graph GenerationMaëlic Neau, Zoe Falomir
Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we build on the powerful Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation (REACT) architecture and propose REACT++, a new state-of-the-art model for real-time SGG. By leveraging efficient feature extraction and subject-to-object cross-attention within the prototype space, REACT++ balances latency and representational power. REACT++ achieves the highest inference speed among existing SGG models, improving relation prediction accuracy without sacrificing object detection performance. Compared to the previous REACT version, REACT++ is 20% faster with a gain of 10% in relation prediction accuracy on average. The code is available at https://github.com/Maelic/SGG-Benchmark.
CVSep 1, 2025Code
Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph GenerationMaëlic Neau, Zoe Falomir, Cédric Buche et al.
Scene Graph Generation (SGG) encodes visual relationships between objects in images as graph structures. Thanks to the advances of Vision-Language Models (VLMs), the task of Open-Vocabulary SGG has been recently proposed where models are evaluated on their functionality to learn a wide and diverse range of relations. Current benchmarks in SGG, however, possess a very limited vocabulary, making the evaluation of open-source models inefficient. In this paper, we propose a new reference-free metric to fairly evaluate the open-vocabulary capabilities of VLMs for relation prediction. Another limitation of Open-Vocabulary SGG is the reliance on weakly supervised data of poor quality for pre-training. We also propose a new solution for quickly generating high-quality synthetic data through region-specific prompt tuning of VLMs. Experimental results show that pre-training with this new data split can benefit the generalization capabilities of Open-Voc SGG models.
AIJan 13
A Qualitative Model to Reason about Object Rotations (QOR) applied to solve the Cube Comparison Test (CCT)Zoe Falomir
This paper presents a Qualitative model for Reasoning about Object Rotations (QOR) which is applied to solve the Cube Comparison Test (CCT) by Ekstrom et al. (1976). A conceptual neighborhood graph relating the Rotation movement to the Location change and the Orientation change (CNGRLO) of the features on the cube sides has been built and it produces composition tables to calculate inferences for reasoning about rotations.
AIJan 14, 2019
Proceedings of the 2nd Symposium on Problem-solving, Creativity and Spatial Reasoning in Cognitive Systems, ProSocrates 2017Ana-Maria Olteteanu, Zoe Falomir
This book contains the accepted papers at ProSocrates 2017 Symposium: Problem-solving,Creativity and Spatial Reasoning in Cognitive Systems. ProSocrates 2017 symposium was held at the Hansewissenschaftkolleg (HWK) of Advanced Studies in Delmenhorst, 20-21July 2017. This was the second edition of this symposium which aims to bring together researchers interested in spatial reasoning, problem solving and creativity.