AIDec 1, 2024

Long text outline generation: Chinese text outline based on unsupervised framework and large language mode

arXiv:2412.00810v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the specific problem of long Chinese text outline generation for document analysis applications, representing an incremental improvement over existing methods.

The paper tackles the problem of generating readable outlines for very long Chinese texts like fictional works, where existing methods struggle with coherent chapter segmentation. The proposed method combining unsupervised framework with large models outperforms several deep learning and large models in segmentation accuracy and outline readability.

Outline generation aims to reveal the internal structure of a document by identifying underlying chapter relationships and generating corresponding chapter summaries. Although existing deep learning methods and large models perform well on small- and medium-sized texts, they struggle to produce readable outlines for very long texts (such as fictional works), often failing to segment chapters coherently. In this paper, we propose a novel outline generation method for Chinese, combining an unsupervised framework with large models. Specifically, the method first generates chapter feature graph data based on entity and syntactic dependency relationships. Then, a representation module based on graph attention layers learns deep embeddings of the chapter graph data. Using these chapter embeddings, we design an operator based on Markov chain principles to segment plot boundaries. Finally, we employ a large model to generate summaries of each plot segment and produce the overall outline. We evaluate our model based on segmentation accuracy and outline readability, and our performance outperforms several deep learning models and large models in comparative evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes