Detection of Slang Words in e-Data using semi-Supervised Learning
This addresses the challenge of identifying abusive language in online communication for content moderation, though it appears incremental.
The paper tackles the problem of detecting slang words in electronic data, including abbreviated forms, using a semi-supervised learning approach that evaluates the probability of suspicious words being slang.
The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.