DBCLJun 20, 2023

Lingua Manga: A Generic Large Language Model Centric System for Data Curation

arXiv:2306.11702v214 citationsh-index: 98
Originality Synthesis-oriented
AI Analysis

This addresses data curation problems for users ranging from skilled programmers to no-code users, but it appears incremental as it applies existing LLMs to a new domain.

The paper tackles the challenge of developing a general-purpose data curation system due to task diversity by presenting Lingua Manga, a user-friendly system that uses pre-trained large language models to optimize performance and label efficiency, demonstrating effectiveness in assisting users of varying technical proficiency across three example applications.

Data curation is a wide-ranging area which contains many critical but time-consuming data processing tasks. However, the diversity of such tasks makes it challenging to develop a general-purpose data curation system. To address this issue, we present Lingua Manga, a user-friendly and versatile system that utilizes pre-trained large language models. Lingua Manga offers automatic optimization for achieving high performance and label efficiency while facilitating flexible and rapid development. Through three example applications with distinct objectives and users of varying levels of technical proficiency, we demonstrate that Lingua Manga can effectively assist both skilled programmers and low-code or even no-code users in addressing data curation challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes