RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
This addresses the problem of dataset management and sharing for researchers in reinforcement learning, though it is incremental as it builds on existing tools like TFDS.
The paper introduces RLDS, an ecosystem for generating, sharing, and using datasets in reinforcement learning and related sequential decision-making fields, aiming to enhance reproducibility, accelerate research, and enable easy testing of algorithms across tasks.
We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also accelerates novel research. By providing a standard and lossless format of datasets it enables to quickly test new algorithms on a wider range of tasks. The RLDS ecosystem makes it easy to share datasets without any loss of information and to be agnostic to the underlying original format when applying various data processing pipelines to large collections of datasets. Besides, RLDS provides tools for collecting data generated by either synthetic agents or humans, as well as for inspecting and manipulating the collected data. Ultimately, integration with TFDS facilitates the sharing of RL datasets with the research community.