Emad Kebriaei

3papers

1,415citations

Novelty43%

AI Score26

Ranked #168,750 of 205,806 authors (top 82%)#28,704 in CL (top 89%)

3 Papers

CLSep 9, 2021

ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization

Alireza Salemi, Emad Kebriaei, Ghazal Neisi Minaei et al.

Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textual entailment, question paraphrasing, and multiple choice question answering. Finally, we established a human evaluation and show that using the semantic score significantly improves summarization results.

CLApr 10, 2021

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Alireza Salemi, Nazanin Sabri, Emad Kebriaei et al.

Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity-based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase.

CLFeb 23, 2019

Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications

Kayvan Bijari, Hadi Zare, Emad Kebriaei et al.

Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this regard is structuring a graph that encodes and encompasses the features of textual data for the effective machine learning algorithm. Besides, exploration and exploiting of semantic relations is regarded as a principal step in text mining applications. However, most of the traditional text mining methods perform somewhat poor in terms of employing such relations. In this paper, we propose a sentence-level graph-based text representation which includes stop words to consider semantic and term relations. Then, we employ a representation learning approach on the combined graphs of sentences to extract the latent and continuous features of the documents. Eventually, the learned features of the documents are fed into a deep neural network for the sentiment classification task. The experimental results demonstrate that the proposed method substantially outperforms the related sentiment analysis approaches based on several benchmark datasets. Furthermore, our method can be generalized on different datasets without any dependency on pre-trained word embeddings.