CLJan 12, 2017

A Data-Oriented Model of Literary Language

arXiv:1701.03329v25.928 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of quantifying literary language for researchers in computational linguistics and digital humanities, representing an incremental advance with specific domain application.

The paper tackled the task of predicting literary quality in texts using human ratings as a gold standard, achieving a model that explains 76.0% of the variation in literary ratings by combining bigram baselines, syntactic tree fragments, and hand-picked features.

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.

View on arXiv PDF

Similar