MT-Adapted Datasheets for Datasets: Template and Repository
This work provides a domain-specific tool for improving dataset documentation in machine translation, but it is incremental as it builds directly on an existing template.
The authors adapted the standardized datasheet template from Gebru et al. (2018) to document machine translation datasets like EuroParl and News-Commentary, and proposed a repository for collecting these adapted datasheets in the machine translation research area.
In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area