LGCYJul 21, 2022

GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

arXiv:2207.10733v31 citationsh-index: 27
Originality Incremental advance
AI Analysis

This dataset addresses a bottleneck for developing ML technologies to promote sustainable consumption in e-commerce, though it is incremental as it builds on existing schema.org standards.

The authors tackled the lack of large, high-quality public datasets with sustainability information for consumer goods by creating GreenDB, a database that collects products from European online shops and uses expert-evaluated sustainability labels as proxies. They demonstrated that machine learning models trained on this data can predict sustainability labels with an F1 score of 96%.

The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes