LGMar 25, 2021

Persistence Homology of TEDtalk: Do Sentence Embeddings Have a Topological Shape?

Shouman Das, Syed A. Haque, Md. Iftekhar Tanveer

arXiv:2103.14131v14.46 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental study for researchers in natural language processing and TDA, showing a negative result that topological shapes of sentence embeddings may not aid in this specific task.

The paper tackled the problem of improving public speaking rating classification by applying topological data analysis (TDA) to sentence embeddings from TEDtalk data, but found that adding persistence image vectors as inputs did not significantly improve model accuracy and sometimes slightly worsened it.

\emph{Topological data analysis} (TDA) has recently emerged as a new technique to extract meaningful discriminitve features from high dimensional data. In this paper, we investigate the possibility of applying TDA to improve the classification accuracy of public speaking rating. We calculated \emph{persistence image vectors} for the sentence embeddings of TEDtalk data and feed this vectors as additional inputs to our machine learning models. We have found a negative result that this topological information does not improve the model accuracy significantly. In some cases, it makes the accuracy slightly worse than the original one. From our results, we could not conclude that the topological shapes of the sentence embeddings can help us train a better model for public speaking rating.

View on arXiv PDF

Similar