CLOct 19, 2023

ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction

arXiv:2310.12537v517 citationsh-index: 14Has Code
Originality Highly original
AI Analysis

This work addresses the need for structured product data in e-commerce platforms, offering incremental improvements in extraction efficiency and robustness for vendors and platforms.

The paper tackled the problem of extracting product attribute-value pairs from unstructured e-commerce descriptions by exploring large language models (LLMs) as a more data-efficient and robust alternative to BERT-based methods. The result showed that GPT-4 achieved the highest average F1-score of 85%, surpassing the best baseline by 5%, with Llama-3-70B offering a competitive open-source alternative.

E-commerce platforms require structured product data in the form of attribute-value pairs to offer features such as faceted product search or attribute-based product comparison. However, vendors often provide unstructured product descriptions, necessitating the extraction of attribute-value pairs from these texts. BERT-based extraction methods require large amounts of task-specific training data and struggle with unseen attribute values. This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative. We propose prompt templates for zero-shot and few-shot scenarios, comparing textual and JSON-based target schema representations. Our experiments show that GPT-4 achieves the highest average F1-score of 85% using detailed attribute descriptions and demonstrations. Llama-3-70B performs nearly as well, offering a competitive open-source alternative. GPT-4 surpasses the best PLM baseline by 5% in F1-score. Fine-tuning GPT-3.5 increases the performance to the level of GPT-4 but reduces the model's ability to generalize to unseen attribute values.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes