CLOct 30, 2022

Transfer Learning with Synthetic Corpora for Spatial Role Labeling and Reasoning

arXiv:2210.16952v225.0311 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses spatial language processing for AI applications, but it is incremental as it builds on existing synthetic data methods for transfer learning.

The authors tackled the problem of spatial language processing by introducing two new datasets for spatial question answering and spatial role labeling, and demonstrated that pretraining with synthetic data significantly improves state-of-the-art results on benchmarks, especially with limited target domain data.

Recent research shows synthetic data as a source of supervision helps pretrained language models (PLM) transfer learning to new target tasks/domains. However, this idea is less explored for spatial language. We provide two new data resources on multiple spatial language processing tasks. The first dataset is synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL). Compared to previous SQA datasets, we include a larger variety of spatial relation types and spatial expressions. Our data generation process is easily extendable with new spatial expression lexicons. The second one is a real-world SQA dataset with human-generated questions built on an existing corpus with SPRL annotations. This dataset can be used to evaluate spatial language processing models in realistic situations. We show pretraining with automatically generated data significantly improves the SOTA results on several SQA and SPRL benchmarks, particularly when the training data in the target domain is small.

View on arXiv PDF Code

Similar