CL AIOct 28, 2025

LASTIST: LArge-Scale Target-Independent STance dataset

DongJae Kim, Yaejin Lee, Minsu Park, Eunil Park

arXiv:2510.25783v1h-index: 4

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited benchmark data for stance detection in Korean, enabling research in low-resource languages, though it is incremental as it extends existing methods to new data.

The study tackled the lack of large-scale, target-independent stance detection datasets in low-resource languages like Korean by creating the LASTIST dataset with 563,299 labeled Korean sentences from political press releases, and they trained state-of-the-art models on it.

Stance detection has emerged as an area of research in the field of artificial intelligence. However, most research is currently centered on the target-dependent stance detection task, which is based on a person's stance in favor of or against a specific target. Furthermore, most benchmark datasets are based on English, making it difficult to develop models in low-resource languages such as Korean, especially for an emerging field such as stance detection. This study proposes the LArge-Scale Target-Independent STance (LASTIST) dataset to fill this research gap. Collected from the press releases of both parties on Korean political parties, the LASTIST dataset uses 563,299 labeled Korean sentences. We provide a detailed description of how we collected and constructed the dataset and trained state-of-the-art deep learning and stance detection models. Our LASTIST dataset is designed for various tasks in stance detection, including target-independent stance detection and diachronic evolution stance detection. We deploy our dataset on https://anonymous.4open.science/r/LASTIST-3721/.

View on arXiv PDF

Similar