Joint Detection of Fraud and Concept Drift inOnline Conversations with LLM-Assisted Judgment
This work addresses the challenge of early fraud detection in online conversations for platform security, though it appears incremental by combining existing techniques like ensemble classification and concept drift analysis with LLM integration.
The paper tackles the problem of detecting fake interactions in digital communication by proposing a two-stage framework that first identifies suspicious conversations and then uses concept drift analysis with LLM assistance to distinguish between fraudulent manipulation and benign topic changes, achieving improved accuracy and interpretability in real-time fraud detection.
Detecting fake interactions in digital communication platforms remains a challenging and insufficiently addressed problem. These interactions may appear as harmless spam or escalate into sophisticated scam attempts, making it difficult to flag malicious intent early. Traditional detection methods often rely on static anomaly detection techniques that fail to adapt to dynamic conversational shifts. One key limitation is the misinterpretation of benign topic transitions referred to as concept drift as fraudulent behavior, leading to either false alarms or missed threats. We propose a two stage detection framework that first identifies suspicious conversations using a tailored ensemble classification model. To improve the reliability of detection, we incorporate a concept drift analysis step using a One Class Drift Detector (OCDD) to isolate conversational shifts within flagged dialogues. When drift is detected, a large language model (LLM) assesses whether the shift indicates fraudulent manipulation or a legitimate topic change. In cases where no drift is found, the behavior is inferred to be spam like. We validate our framework using a dataset of social engineering chat scenarios and demonstrate its practical advantages in improving both accuracy and interpretability for real time fraud detection. To contextualize the trade offs, we compare our modular approach against a Dual LLM baseline that performs detection and judgment using different language models.