AI CL CVDec 13, 2020

Learning Contextual Causality from Time-consecutive Images

Hongming Zhang, Yintong Huo, Xinran Zhao, Yangqiu Song, Dan Roth

arXiv:2012.07138v15.76 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of acquiring scalable and contextual causality knowledge for AI systems, which is currently limited by expensive and context-agnostic text-based annotation methods.

This paper explores learning contextual causality from visual signals in time-consecutive images, moving beyond text-based methods. They propose a dataset, Vis-Causal, and demonstrate that meaningful causal knowledge can be automatically discovered from videos using language and visual representation models, highlighting the importance of context.

Causality knowledge is crucial for many artificial intelligence systems. Conventional textual-based causality knowledge acquisition methods typically require laborious and expensive human annotations. As a result, their scale is often limited. Moreover, as no context is provided during the annotation, the resulting causality knowledge records (e.g., ConceptNet) typically do not take the context into consideration. To explore a more scalable way of acquiring causality knowledge, in this paper, we jump out of the textual domain and investigate the possibility of learning contextual causality from the visual signal. Compared with pure text-based approaches, learning causality from the visual signal has the following advantages: (1) Causality knowledge belongs to the commonsense knowledge, which is rarely expressed in the text but rich in videos; (2) Most events in the video are naturally time-ordered, which provides a rich resource for us to mine causality knowledge from; (3) All the objects in the video can be used as context to study the contextual property of causal relations. In detail, we first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that with good language and visual representation models as well as enough training signals, it is possible to automatically discover meaningful causal knowledge from the videos. Further analysis also shows that the contextual property of causal relations indeed exists, taking which into consideration might be crucial if we want to use the causality knowledge in real applications, and the visual signal could serve as a good resource for learning such contextual causality.

View on arXiv PDF Code

Similar