How Many Pages? Paper Length Prediction from the Metadata
This work addresses a niche problem for researchers and publishers in managing paper submissions, but it is incremental as it applies existing methods to a new dataset.
The authors tackled the problem of predicting scientific paper length from metadata by framing it as a regression task and created a large dataset of publication metadata with page counts, reporting experimental results using popular machine learning models.
Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models.