RO CVApr 22

Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

arXiv:2604.2105329.6h-index: 7

AI Analysis

For robotic manipulation understanding, this work provides an interpretable and uncertainty-aware symbolic state that improves next-primitive prediction and robustness over classical symbolic and end-to-end baselines.

This paper proposes eSEC-LAM, a neuro-symbolic framework that enriches Semantic Event Chains with confidence-aware predicates and affordance priors for manipulation understanding. On EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101, it achieves competitive action recognition, substantially improves next-primitive prediction, and shows robustness to perception noise.

Robotic systems operating in human environments must reason about how object interactions evolve over time, which actions are currently being performed, and what manipulation step is likely to follow. Classical enriched Semantic Event Chains (eSECs) provide an interpretable relational description of manipulation, but remain primarily descriptive and do not directly support uncertainty-aware decision making. In this paper, we propose eSEC-LAM, a neuro-symbolic framework that transforms eSECs into an explicit event-level symbolic state for manipulation understanding. The proposed formulation augments classical eSECs with confidence-aware predicates, functional object roles, affordance priors, primitive-level abstraction, and saliency-guided explanation cues. These enriched symbolic states are derived from a foundation-model-based perception front-end through deterministic predicate extraction, while current-action inference and next-primitive prediction are performed using lightweight symbolic reasoning over primitive pre- and post-conditions. We evaluate the proposed framework on EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101 across action recognition, next-primitive prediction, robustness to perception noise, and explanation consistency. Experimental results show that eSEC-LAM achieves competitive action recognition, substantially improves next-primitive prediction, remains more robust under degraded perceptual conditions than both classical symbolic and end-to-end video baselines, and provides temporally consistent explanation traces grounded in explicit relational evidence. These findings demonstrate that enriched Semantic Event Chains can serve not only as interpretable descriptors of manipulation, but also as effective internal states for neuro-symbolic action reasoning.

View on arXiv PDF

Similar