CLMay 27, 2020

MT-Adapted Datasheets for Datasets: Template and Repository

arXiv:2005.13156v112 citations
Originality Synthesis-oriented
AI Analysis

This work provides a domain-specific tool for improving dataset documentation in machine translation, but it is incremental as it builds directly on an existing template.

The authors adapted the standardized datasheet template from Gebru et al. (2018) to document machine translation datasets like EuroParl and News-Commentary, and proposed a repository for collecting these adapted datasheets in the machine translation research area.

In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes