Reassessing Graph Linearization for Sequence-to-sequence AMR Parsing: On the Advantages and Limitations of Triple-Based Encoding
This work addresses graph linearization issues for AMR parsing researchers, but it is incremental as it compares existing methods without achieving superior results.
The paper tackled the problem of linearizing AMR graphs for sequence-to-sequence parsing by proposing a triple-based encoding to address limitations of the standard Penman method, such as distant node placement and doubled relation types, but found that triple encoding still underperforms compared to Penman's concise representation.
Sequence-to-sequence models are widely used to train Abstract Meaning Representation (Banarescu et al., 2013, AMR) parsers. To train such models, AMR graphs have to be linearized into a one-line text format. While Penman encoding is typically used for this purpose, we argue that it has limitations: (1) for deep graphs, some closely related nodes are located far apart in the linearized text (2) Penman's tree-based encoding necessitates inverse roles to handle node re-entrancy, doubling the number of relation types to predict. To address these issues, we propose a triple-based linearization method and compare its efficiency with Penman linearization. Although triples are well suited to represent a graph, our results suggest room for improvement in triple encoding to better compete with Penman's concise and explicit representation of a nested graph structure.