CLJan 26, 2021

Coloring the Black Box: What Synesthesia Tells Us about Character Embeddings

Katharina Kann, Mauro M. Monsalve-Mercado

arXiv:2101.10565v132.7800 citations

Originality Incremental advance

AI Analysis

This work addresses the lack of understanding in character embeddings for NLP researchers, providing insights into model-human alignment, but it is incremental as it builds on existing embedding methods.

The study investigated the similarity between character embeddings from 10 models and human synesthetic perception of characters, finding that LSTMs align more with humans than transformers, grapheme-to-phoneme conversion yields the most human-like embeddings, and ELMo embeddings differ from both humans and other models.

In contrast to their word- or sentence-level counterparts, character embeddings are still poorly understood. We aim at closing this gap with an in-depth study of English character embeddings. For this, we use resources from research on grapheme-color synesthesia -- a neuropsychological phenomenon where letters are associated with colors, which give us insight into which characters are similar for synesthetes and how characters are organized in color space. Comparing 10 different character embeddings, we ask: How similar are character embeddings to a synesthete's perception of characters? And how similar are character embeddings extracted from different models? We find that LSTMs agree with humans more than transformers. Comparing across tasks, grapheme-to-phoneme conversion results in the most human-like character embeddings. Finally, ELMo embeddings differ from both humans and other models.

View on arXiv PDF

Similar