LG CLSep 25, 2025

Leveraging Big Data Frameworks for Spam Detection in Amazon Reviews

Mst Eshita Khatun, Halima Akter, Tasnimul Rehan, Toufiq Ahmed

arXiv:2509.21579v11 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This addresses the problem of fraudulent reviews misleading consumers and damaging seller reputations in online shopping, but it is incremental as it applies existing methods to a new dataset.

The research tackled spam detection in Amazon reviews using big data analytics and machine learning, achieving 90.35% accuracy with Logistic Regression to enhance review authenticity.

In this digital era, online shopping is common practice in our daily lives. Product reviews significantly influence consumer buying behavior and help establish buyer trust. However, the prevalence of fraudulent reviews undermines this trust by potentially misleading consumers and damaging the reputations of the sellers. This research addresses this pressing issue by employing advanced big data analytics and machine learning approaches on a substantial dataset of Amazon product reviews. The primary objective is to detect and classify spam reviews accurately so that it enhances the authenticity of the review. Using a scalable big data framework, we efficiently process and analyze a large scale of review data, extracting key features indicative of fraudulent behavior. Our study illustrates the utility of various machine learning classifiers in detecting spam reviews, with Logistic Regression achieving an accuracy of 90.35%, thus contributing to a more trustworthy and transparent online shopping environment.

View on arXiv PDF

Similar