Ranking Creative Language Characteristics in Small Data Scenarios
This work addresses the challenge of expensive data labeling for creative language ranking, which is important for downstream language applications, though it is incremental as it builds on existing models.
The paper tackles the problem of ranking creative language with limited labeled data by adapting the DirectRanker model and combining it with Gaussian process preference learning, achieving average Spearman's ρ improvements of 14% and 16% on humor and metaphor novelty tasks compared to previous state-of-the-art methods.
The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's $ρ$ by 14% and 16% on average.