CLMay 10

PumpSense: Real-Time Detection and Target Extraction of Crypto Pump-and-Dumps on Telegram

arXiv:2605.094319.3

AI Analysis

For cryptocurrency market regulators and exchanges, this provides the first message-level detection and extraction benchmark for Telegram-coordinated pump-and-dumps, addressing a gap in real-time, reliable detection.

The paper introduces a manually labeled dataset of 280,000 Telegram posts from pump-and-dump groups and proposes real-time detection (F1=0.83, 50ms latency) and target extraction (LLM accuracy 0.91) methods, outperforming prior market-data-based approaches.

Cryptocurrency pump-and-dump schemes coordinated via Telegram threaten market integrity. However, existing research addressing this specific threat has not yet produced solutions that combine reliable results with fast response. This is in part due to the absence of publicly available, message-level labeled data, as well as design choices. In this paper, we address both issues. In particular, we introduce a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, all manually reviewed to identify 2,246 pump announcements and their targeted cryptocurrency and exchange. Leveraging this dataset, we define two tasks: real-time pump-announcement detection and target cryptocurrency/exchange extraction. For detection, we compare two machine-learning models: a lightweight tree-based LightGBM classifier (F1=0.79, latency=9.4 s/sample) and a transformer-based BGE-M3 (F1=0.83, latency=50 ms/sample). With our proposed approach, we show that message analysis can achieve near-instant pump detection at the level of individual Telegram message windows. Unlike prior work that relies purely on market data and typically detects pumps tens of seconds after abnormal trading activity is observed, our method operates directly on the coordination messages themselves and can be evaluated in microseconds per window on commodity hardware. To our knowledge, we also establish the first benchmark for manipulated coin and exchange extraction. We demonstrate that traditional rule-based extraction methods, widely relied upon in prior literature, are ineffective due to ticker ambiguity. In contrast, LLMs achieve the highest accuracy with a score of 0.91.

View on arXiv PDF

Similar