On the Contribution of Discourse Structure on Text Complexity Assessment
This work addresses text complexity assessment for applications like readability analysis, but it is incremental as it builds on existing feature comparisons.
The paper tackled the problem of assessing text complexity by investigating the influence of discourse features, finding that coherence features are more correlated with text complexity than other feature types, with the top discriminating feature being a coherence feature in both datasets.
This paper investigates the influence of discourse features on text complexity assessment. To do so, we created two data sets based on the Penn Discourse Treebank and the Simple English Wikipedia corpora and compared the influence of coherence, cohesion, surface, lexical and syntactic features to assess text complexity. Results show that with both data sets coherence features are more correlated to text complexity than the other types of features. In addition, feature selection revealed that with both data sets the top most discriminating feature is a coherence feature.