CLJan 12, 2017

A Data-Oriented Model of Literary Language

arXiv:1701.03329v228 citations
AI Analysis

This work addresses the problem of quantifying literary language for researchers in computational linguistics and digital humanities, representing an incremental advance with specific domain application.

The paper tackled the task of predicting literary quality in texts using human ratings as a gold standard, achieving a model that explains 76.0% of the variation in literary ratings by combining bigram baselines, syntactic tree fragments, and hand-picked features.

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes