LGMar 25, 2021

Persistence Homology of TEDtalk: Do Sentence Embeddings Have a Topological Shape?

arXiv:2103.14131v16 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental study for researchers in natural language processing and TDA, showing a negative result that topological shapes of sentence embeddings may not aid in this specific task.

The paper tackled the problem of improving public speaking rating classification by applying topological data analysis (TDA) to sentence embeddings from TEDtalk data, but found that adding persistence image vectors as inputs did not significantly improve model accuracy and sometimes slightly worsened it.

\emph{Topological data analysis} (TDA) has recently emerged as a new technique to extract meaningful discriminitve features from high dimensional data. In this paper, we investigate the possibility of applying TDA to improve the classification accuracy of public speaking rating. We calculated \emph{persistence image vectors} for the sentence embeddings of TEDtalk data and feed this vectors as additional inputs to our machine learning models. We have found a negative result that this topological information does not improve the model accuracy significantly. In some cases, it makes the accuracy slightly worse than the original one. From our results, we could not conclude that the topological shapes of the sentence embeddings can help us train a better model for public speaking rating.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes