CLAug 15, 2024

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

Shachar Don-Yehiya, Leshem Choshen, Omri Abend

IBM

arXiv:2408.08291v26.69 citationsh-index: 30Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited open data for model development in the research community, though it is incremental as it builds on existing data collection concepts.

The authors tackled the lack of open human-model conversation data by introducing the ShareLM collection and a plugin for voluntarily contributing chats, enabling users to share and rate conversations from most platforms while maintaining privacy controls.

Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.

View on arXiv PDF

Similar