Anjum

CL
4papers
102citations
Novelty16%
AI Score16

4 Papers

CLApr 23, 2021
Analysis of Online Toxicity Detection Using Machine Learning Approaches

Anjum, Rahul Katarya

Social media and the internet have become an integral part of how people spread and consume information. Over a period of time, social media evolved dramatically, and almost half of the population is using social media to express their views and opinions. Online hate speech is one of the drawbacks of social media nowadays, which needs to be controlled. In this paper, we will understand how hate speech originated and what are the consequences of it; Trends of machine-learning algorithms to solve an online hate speech problem. This study contributes by providing a systematic approach to help researchers to identify a new research direction and elucidating the shortcomings of the studies and model, as well as providing future directions to advance the field.

CLApr 23, 2021
Automated News Summarization Using Transformers

Anushka Gupta, Diksha Chugh, Anjum et al.

The amount of text data available online is increasing at a very fast pace hence text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence generating automated summaries for the data and using it to train machine learning models will make these models space and time-efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries for evaluating and comparing the summaries generated by machine learning models.

CLApr 23, 2021
Analysing Cyberbullying using Natural Language Processing by Understanding Jargon in Social Media

Bhumika Bhatia, Anuj Verma, Anjum et al.

Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity, cyberbullying amongst children and other vulnerable groups are only growing over online classes, and increased access to social platforms, especially post COVID-19. It is paramount to detect and ensure minors' safety across social platforms so that any violence or hate-crime is automatically detected and strict action is taken against it. In our work, we explore binary classification by using a combination of datasets from various social media platforms that cover a wide range of cyberbullying such as sexism, racism, abusive, and hate-speech. We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus, achieving a higher precision in comparison to models without slang preprocessing.

CLApr 23, 2021
Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech

Tashvik Dhamija, Anjum, Rahul Katarya

In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.