CL AIMar 10, 2024

Fine-grainedly Synthesize Streaming Data Based On Large Language Models With Graph Structure Understanding For Data Sparsity

Xin Zhang, Linhai Zhang, Deyu Zhou, Guoqiang Xu

arXiv:2403.06139v11.93 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses data sparsity problems for e-commerce platforms analyzing user reviews, representing an incremental improvement over existing LLM-based approaches.

The paper tackles sentiment analysis performance issues caused by sparse user data in e-commerce by proposing a framework that uses LLMs with enhanced graph understanding to synthesize supplementary data, achieving MSE reductions of 45.85%, 3.16%, and 62.21% on three datasets.

Due to the sparsity of user data, sentiment analysis on user reviews in e-commerce platforms often suffers from poor performance, especially when faced with extremely sparse user data or long-tail labels. Recently, the emergence of LLMs has introduced new solutions to such problems by leveraging graph structures to generate supplementary user profiles. However, previous approaches have not fully utilized the graph understanding capabilities of LLMs and have struggled to adapt to complex streaming data environments. In this work, we propose a fine-grained streaming data synthesis framework that categorizes sparse users into three categories: Mid-tail, Long-tail, and Extreme. Specifically, we design LLMs to comprehensively understand three key graph elements in streaming data, including Local-global Graph Understanding, Second-Order Relationship Extraction, and Product Attribute Understanding, which enables the generation of high-quality synthetic data to effectively address sparsity across different categories. Experimental results on three real datasets demonstrate significant performance improvements, with synthesized data contributing to MSE reductions of 45.85%, 3.16%, and 62.21%, respectively.

View on arXiv PDF

Similar