CLJun 15, 2021

BERT Embeddings for Automatic Readability Assessment

arXiv:2106.07935v2661 citations
AI Analysis

This addresses the problem of improving readability assessment for low-resource languages like Filipino, where NLP tools are limited, though it is incremental as it builds on existing BERT and feature-based methods.

The study tackled automatic readability assessment by combining BERT embeddings with handcrafted linguistic features, resulting in a method that outperformed classical approaches with up to a 12.4% increase in F1 performance on English and Filipino datasets.

Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes