CLJan 16, 2024

Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration

Simone Balloccu, Ehud Reiter, Vivek Kumar, Diego Reforgiato Recupero, Daniele Riboni

arXiv:2401.08420v11.0

Originality Synthesis-oriented

AI Analysis

This addresses the problem of dataset scarcity for nutrition counselling, though it is incremental as it applies existing methods to a new domain.

The study tackled the lack of public datasets in nutrition counselling by creating a high-quality dataset through human-AI collaboration, resulting in HAI-coaching with ~2.4K dietary struggles and ~97K supportive texts, but found ChatGPT unsuitable for unsupervised use due to harmful behaviors.

Large Language Models (LLMs), with their flexible generation abilities, can be powerful data sources in domains with few or no available corpora. However, problems like hallucinations and biases limit such applications. In this case study, we pick nutrition counselling, a domain lacking any public resource, and show that high-quality datasets can be gathered by combining LLMs, crowd-workers and nutrition experts. We first crowd-source and cluster a novel dataset of diet-related issues, then work with experts to prompt ChatGPT into producing related supportive text. Finally, we let the experts evaluate the safety of the generated text. We release HAI-coaching, the first expert-annotated nutrition counselling dataset containing ~2.4K dietary struggles from crowd workers, and ~97K related supportive texts generated by ChatGPT. Extensive analysis shows that ChatGPT while producing highly fluent and human-like text, also manifests harmful behaviours, especially in sensitive topics like mental health, making it unsuitable for unsupervised use.

View on arXiv PDF

Similar