CLAug 21, 2021

How Cute is Pikachu? Gathering and Ranking Pokémon Properties from Data with Pokémon Word Embeddings

Mika Hämäläinen, Khalid Alnajjar, Niko Partanen

arXiv:2108.09546v10.51 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific task for Pokémon enthusiasts, but it is incremental as it applies existing NLP methods to a new dataset.

The study tackled the problem of automatically ranking descriptive properties for the original 151 Pokémon by training word embeddings on a domain-specific corpus, finding that Word2Vec outperformed fastText with less noise, though all methods had significant noise.

We present different methods for obtaining descriptive properties automatically for the 151 original Pokémon. We train several different word embeddings models on a crawled Pokémon corpus, and use them to rank automatically English adjectives based on how characteristic they are to a given Pokémon. Based on our experiments, it is better to train a model with domain specific data than to use a pretrained model. Word2Vec produces less noise in the results than fastText model. Furthermore, we expand the list of properties for each Pokémon automatically. However, none of the methods is spot on and there is a considerable amount of noise in the different semantic models. Our models have been released on Zenodo.

View on arXiv PDF

Similar