LGCLOct 28, 2024

LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

arXiv:2410.21520v431 citationsh-index: 14ACL
Originality Incremental advance
AI Analysis

This addresses data completeness issues in domains like healthcare and finance, though it is an incremental advancement in ensemble methods for LLM-based imputation.

The paper tackles missing data imputation by proposing LLM-Forest, a framework that uses an ensemble of LLMs with graph-augmented prompts and confidence-based voting, achieving improved performance on 9 real-world datasets.

Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating biases and uncertainty in LLM outputs. To address these issues, we propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting based on LLM self-assessment, inspired by the ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes