CLMar 6, 2021

JPS-daprinfo: A Dataset for Japanese Dialog Act Analysis and People-related Information Detection

arXiv:2103.11786v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This provides a resource for researchers in natural language processing focusing on Japanese dialog analysis, but it is incremental as it builds on existing data.

The authors tackled the lack of a labeled dataset for Japanese dialog act analysis and people-related information detection by annotating 20,130 sentences from 50 interview dialogues with 13 labels, resulting in a new dataset called JPS-daprinfo.

We conducted a labeling work on a spoken Japanese dataset (I-JAS) for the text classification, which contains 50 interview dialogues of two-way Japanese conversation that discuss the participants' past present and future. Each dialogue is 30 minutes long. From this dataset, we selected the interview dialogues of native Japanese speakers as the samples. Given the dataset, we annotated sentences with 13 labels. The labeling work was conducted by native Japanese speakers who have experiences with data annotation. The total amount of the annotated samples is 20130.

View on arXiv PDF Code

Similar