IR AI CLFeb 6, 2025

QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration

Shaola Ren, Li Ke, Longtao Huang, Dehong Gao, Hui Xue

arXiv:2502.18480v13.6h-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of query extraction for toxic content exploration, which is important for content moderation systems, but it appears incremental as it builds on existing LLM capabilities with specific training enhancements.

The study tackled the problem of automatically extracting effective queries for exploring toxic content, which is often disguised, by proposing QExplorer, a large language model-based approach with a 2-stage training process; results showed that it outperformed several LLMs and humans in offline tests and significantly increased toxic item detection in online deployment.

Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM), we are able to leverage the capabilities of LLMs to extract effective queries for similar content exploration directly. This study proposes QExplorer, an approach of large language model based Query Extraction for toxic content Exploration. The QExplorer approach involves a 2-stage training process: instruction Supervised FineTuning (SFT) and preference alignment using Direct Preference Optimization (DPO), as well as the datasets construction with feedback of search system. To verify the effectiveness of QExplorer, a series of offline and online experiments are conducted on our real-world system. The offline empirical results demonstrate that the performance of our automatic query extraction outperforms that of several LLMs and humans. The online deployment shows a significant increase in the detection of toxic items.

View on arXiv PDF

Similar