When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach
This addresses the bottleneck of data quality for reliable analytics in TAGs, offering a systematic solution for applications in data management and analytics, though it is incremental as it builds on existing data-level optimization approaches.
The paper tackles the problem of data quality in text-attributed graphs (TAGs), which degrades the performance of graph neural networks (GNNs), and proposes LAGA, a multi-agent framework that improves textual, structural, and label aspects, achieving effectiveness, robustness, and scalability in experiments across 5 datasets and 16 baselines.
Text-attributed graphs (TAGs) have become a key form of graph-structured data in modern data management and analytics, combining structural relationships with rich textual semantics for diverse applications. However, the effectiveness of analytical models, particularly graph neural networks (GNNs), is highly sensitive to data quality. Our empirical analysis shows that both conventional and LLM-enhanced GNNs degrade notably under textual, structural, and label imperfections, underscoring TAG quality as a key bottleneck for reliable analytics. Existing studies have explored data-level optimization for TAGs, but most focus on specific degradation types and target a single aspect like structure or label, lacking a systematic and comprehensive perspective on data quality improvement. To address this gap, we propose LAGA (Large Language and Graph Agent), a unified multi-agent framework for comprehensive TAG quality optimization. LAGA formulates graph quality control as a data-centric process, integrating detection, planning, action, and evaluation agents into an automated loop. It holistically enhances textual, structural, and label aspects through coordinated multi-modal optimization. Extensive experiments on 5 datasets and 16 baselines across 9 scenarios demonstrate the effectiveness, robustness and scalability of LAGA, confirming the importance of data-centric quality optimization for reliable TAG analytics.