CLMay 19, 2021

Essay-BR: a Brazilian Corpus of Essays

arXiv:2105.09081v121 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers working on essay scoring in Portuguese, though it is incremental as it adapts existing methods to a new language.

The authors tackled the lack of a Portuguese-language corpus for Automatic Essay Scoring by creating Essay-BR, a large dataset of Brazilian high school students' argumentative essays manually graded by experts, and they demonstrated challenges in processing Portuguese text.

Automatic Essay Scoring (AES) is defined as the computer technology that evaluates and scores the written essays, aiming to provide computational models to grade essays either automatically or with minimal human involvement. While there are several AES studies in a variety of languages, few of them are focused on the Portuguese language. The main reason is the lack of a corpus with manually graded essays. In order to bridge this gap, we create a large corpus with several essays written by Brazilian high school students on an online platform. All of the essays are argumentative and were scored across five competencies by experts. Moreover, we conducted an experiment on the created corpus and showed challenges posed by the Portuguese language. Our corpus is publicly available at https://github.com/rafaelanchieta/essay.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes