CYApr 21

BuyTheBy: A dataset of 18,710 text-based paper mill advertisements with 51,812 timestamped prices

arXiv:2604.245762.8
Predicted impact top 88% in CY · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers studying academic fraud, this dataset provides the first large-scale price data on paper mill services, though the analysis is elementary.

This paper introduces BuyTheBy, a dataset of 18,710 text-based paper mill advertisements with 51,812 timestamped prices from seven businesses, enabling quantitative analysis of the market for academic fraud services.

The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. We perform elementary analysis of this dataset to demonstrate its utility for quantitative understanding of markets for academic fraud services and suggest future use cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes