Synthetic Text Generation using Hypergraph Representations
This addresses the need for controlled and diverse text generation in natural language processing applications, though it appears incremental as it builds on existing LLM-based methods with a novel intermediate representation.
The paper tackles the problem of generating diverse synthetic document variants by decomposing documents into semantic frames modeled as hypergraphs, which allows principled perturbations to produce documents that vary in style, sentiment, format, composition, and facts.
Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse format. The frames are modeled using a hypergraph, which allows perturbing the frame contents in a principled manner. Specifically, new hyperedges are mined through topological analysis and complex polyadic relationships including hierarchy and temporal dynamics are accommodated. We show that our solution generates documents that are diverse, coherent and vary in style, sentiment, format, composition and facts.