CLMay 13, 2022

TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages

arXiv:2205.06435v1628 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of effectively exploiting topological information in web pages for structural reading comprehension, which is an incremental improvement over prior methods that used HTML tags or XPaths.

The paper tackles the problem of structural reading comprehension on web pages by proposing a Topological Information Enhanced model (TIE) that transforms token-level tasks into tag-level processes and integrates Graph Attention Networks with pre-trained language models, achieving state-of-the-art performance on the WebSRC benchmark.

Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests. Although previous SRC work has leveraged extra information such as HTML tags or XPaths, the informative topology of web pages is not effectively exploited. In this work, we propose a Topological Information Enhanced model (TIE), which transforms the token-level task into a tag-level task by introducing a two-stage process (i.e. node locating and answer refining). Based on that, TIE integrates Graph Attention Network (GAT) and Pre-trained Language Model (PLM) to leverage the topological information of both logical structures and spatial structures. Experimental results demonstrate that our model outperforms strong baselines and achieves state-of-the-art performances on the web-based SRC benchmark WebSRC at the time of writing. The code of TIE will be publicly available at https://github.com/X-LANCE/TIE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes