Elena Shushkevich

2papers

2 Papers

CLSep 21, 2023
SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

Elena Shushkevich, Long Mai, Manuel V. Loureiro et al.

The proliferation of news media outlets has increased the demand for intelligent systems capable of detecting redundant information in news articles in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive downstream performance. Segmenting news similarity datasets into topics improves the training of these models by forcing them to learn how to distinguish salient characteristics under more narrow domains. However, this requires the existence of topic-specific datasets, which are currently lacking. In this article, we propose a novel dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. Futhermore, we present four different levels of complexity, specifically designed for news similarity detection task. We benchmarked the created datasets using MinHash, BERT, SBERT, and SimCSE models.

CLJan 14, 2021
TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Elena Shushkevich, John Cardiff

The paper is devoted to the participation of the TUDublin team in Constraint@AAAI2021 - COVID19 Fake News Detection Challenge. Today, the problem of fake news detection is more acute than ever in connection with the pandemic. The number of fake news is increasing rapidly and it is necessary to create AI tools that allow us to identify and prevent the spread of false information about COVID-19 urgently. The main goal of the work was to create a model that would carry out a binary classification of messages from social media as real or fake news in the context of COVID-19. Our team constructed the ensemble consisting of Bidirectional Long Short Term Memory, Support Vector Machine, Logistic Regression, Naive Bayes and a combination of Logistic Regression and Naive Bayes. The model allowed us to achieve 0.94 F1-score, which is within 5\% of the best result.