AI CLOct 8, 2023

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See

arXiv:2310.05210v137.5133 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of analyzing author stance in argument mining for researchers and practitioners by incorporating multimodal data, representing an incremental advance over text-only methods.

The authors tackled the problem of argument mining with a new dataset containing both text and images, including visual elements and optical characters, by developing TILFA, a unified framework for fusing text, image, and layout data, which achieved first place in a shared task leaderboard for argumentative stance classification.

A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argument Mining), is designed to handle this mixed data. It excels at not only understanding text but also detecting optical characters and recognizing layout details in images. Our model significantly outperforms existing baselines, earning our team, KnowComp, the 1st place in the leaderboard of Argumentative Stance Classification subtask in this shared task.

View on arXiv PDF Code

Similar