Traditional Readability Formulas Compared for English
This work addresses the need for updated readability tools in NLP, particularly for text simplification and medical domains, but it is incremental as it builds on existing formulas.
The paper tackled the problem of outdated readability formulas in NLP by introducing NERF, recalibrating coefficients of five traditional formulas, and evaluating them for text simplification and medical texts, resulting in a Python program for broader application.
Traditional English readability formulas, or equations, were largely developed in the 20th century. Nonetheless, many researchers still rely on them for various NLP applications. This phenomenon is presumably due to the convenience and straightforwardness of readability formulas. In this work, we contribute to the NLP community by 1. introducing New English Readability Formula (NERF), 2. recalibrating the coefficients of old readability formulas (Flesch-Kincaid Grade Level, Fog Index, SMOG Index, Coleman-Liau Index, and Automated Readability Index), 3. evaluating the readability formulas, for use in text simplification studies and medical texts, and 4. developing a Python-based program for the wide application to various NLP projects.