CLAIMay 20, 2016

As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

arXiv:1605.06319v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of preserving cultural heritage for linguists and researchers by incrementally updating simile corpora.

The paper tackled the challenge of collecting contemporary similes in Serbian by developing a semi-automated methodology using text mining techniques, resulting in an expansion of an existing corpus from 333 to 779 similes.

Similes are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated collection of similes from the world wide web using text mining techniques. We expanded an existing corpus of traditional similes (containing 333 similes) by collecting 446 additional expressions. We, also, explore how crowdsourcing can be used to extract and curate new similes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes