Elsevier OA CC-By Corpus
This provides a new dataset for researchers in scientific text analysis, though it is incremental as it builds on existing open-access resources.
The authors introduced the Elsevier OA CC-BY corpus, which is the first open corpus of scientific research papers with a representative sample across disciplines, including full text, metadata, and bibliographic references.
We introduce the Elsevier OA CC-BY corpus. This is the first open corpus of Scientific Research papers which has a representative sample from across scientific disciplines. This corpus not only includes the full text of the article, but also the metadata of the documents, along with the bibliographic information for each reference.