IRJun 26, 2023
Contrastive Multi-view Framework for Customer Lifetime Value PredictionChuhan Wu, Jingjie Li, Qinglin Jia et al. · tencent-ai
Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV prediction, which is a plug-and-play solution compatible with various backbone models. It synthesizes multiple heterogeneous LTV regressors with complementary knowledge to improve model robustness and captures sample relatedness via contrastive learning to mitigate the dependency on data abundance. Concretely, we use a decomposed scheme that converts the LTV prediction problem into a combination of estimating consumption probability and payment amount. To alleviate the impact of noisy data on model learning, we propose a multi-view framework that jointly optimizes multiple types of regressors with diverse characteristics and advantages to encode and fuse comprehensive knowledge. To fully exploit the potential of limited training samples, we propose a hybrid contrastive learning method to help capture the relatedness between samples in both classification and regression tasks. We conduct extensive experiments on a real-world game LTV prediction dataset and the results validate the effectiveness of our method. We have deployed our solution online in Huawei's mobile game center and achieved 32.26% of total payment amount gains.
35.9CRMay 14
Analyzing Codes of Conduct for Online Safety in Video Games at ScaleJiuming Jiang, Shidong Pan, Daniel W Woods et al.
Online video games have become major online social spaces where users interact, compete, and create together. These spaces, however, expose users to a wide spectrum of online harms, including harassment, discrimination, inappropriate content, privacy breach, cheating, and more. The shape and severity of such harms vary across game design, mechanics, and community context. To mitigate these harms, game companies issue Codes of Conduct (CoCs) that articulate online safety rules and direct players to safety resources. However, it remains unclear how prevalent CoCs are, what safety, security and privacy violations they govern, and whether they meet growing regulatory and industry expectations. We develop and leverage CONDUCTIFY, a pipeline for identifying and analyzing CoCs at scale. Applied to Steam, the largest PC game marketplace, it located the available CoCs for 350 of the 9,586 multiplayer titles on Steam. We found that CoCs are more available among popular, adult-oriented, and community-driven games, while most multiplayer games operate without CoCs despite regulatory and industry recommendations. Although over 80% of the games with CoCs available consistently address traditional security and safety violations, their governance approaches vary substantially across types of violations. A further asymmetry emerges in specificity. Compared with harms related to gameplay mechanics, the articulations of interpersonal harm and the underage player safety are often less specific, despite their relevance to many game communities. Together, these results inform the improvement of online safety governance and CoC enforcement practices, and building better safety infrastructure for the community of players and developers.
40.8HCMar 27
Characterizing Scam-Driven Human Trafficking Across Chinese Borders and Online Community Responses on RedNoteJiamin Zheng, Yue Deng, Jessica Chen et al.
A new form of human trafficking has emerged across Chinese borders, where individuals are lured to Southeast Asia with fraudulent job offers and then coerced into operating online scams. Despite its massive economic and human toll, this scam-driven trafficking remains underexplored in academic research. Through qualitative analysis of 158 RedNote posts, we examined how Chinese online communities respond to this threat. Our findings reveal that perpetrators exploit cultural ties to recruit victims for cybercriminal roles within self-sustaining compounds, using sophisticated manipulation tactics. Survivors face serious reintegration barriers, including family rejection, as the cultural values that enable trafficking also hinder their recovery. While communities present protective strategies, efforts are complicated by doubts about the reliability of support and cross-border coordination. We discuss key implications for prevention, platform governance, and international cooperation against scam-driven trafficking. Warning: This paper contains descriptions of physical, psychological, and sexual abuse.
HCMar 5, 2025
"Impressively Scary:" Exploring User Perceptions and Reactions to Unraveling Machine Learning Models in Social Media ApplicationsJack West, Bengisu Cagiltay, Shirley Zhang et al.
Machine learning models deployed locally on social media applications are used for features, such as face filters which read faces in-real time, and they expose sensitive attributes to the apps. However, the deployment of machine learning models, e.g., when, where, and how they are used, in social media applications is opaque to users. We aim to address this inconsistency and investigate how social media user perceptions and behaviors change once exposed to these models. We conducted user studies (N=21) and found that participants were unaware to both what the models output and when the models were used in Instagram and TikTok, two major social media platforms. In response to being exposed to the models' functionality, we observed long term behavior changes in 8 participants. Our analysis uncovers the challenges and opportunities in providing transparency for machine learning models that interact with local user data.
CLSep 28, 2025
Emission-GPT: A domain-specific language model agent for knowledge retrieval, emission inventory and data analysisJiashu Ye, Tong Wu, Weiwen Chen et al.
Improving air quality and addressing climate change relies on accurate understanding and analysis of air pollutant and greenhouse gas emissions. However, emission-related knowledge is often fragmented and highly specialized, while existing methods for accessing and compiling emissions data remain inefficient. These issues hinder the ability of non-experts to interpret emissions information, posing challenges to research and management. To address this, we present Emission-GPT, a knowledge-enhanced large language model agent tailored for the atmospheric emissions domain. Built on a curated knowledge base of over 10,000 documents (including standards, reports, guidebooks, and peer-reviewed literature), Emission-GPT integrates prompt engineering and question completion to support accurate domain-specific question answering. Emission-GPT also enables users to interactively analyze emissions data via natural language, such as querying and visualizing inventories, analyzing source contributions, and recommending emission factors for user-defined scenarios. A case study in Guangdong Province demonstrates that Emission-GPT can extract key insights--such as point source distributions and sectoral trends--directly from raw data with simple prompts. Its modular and extensible architecture facilitates automation of traditionally manual workflows, positioning Emission-GPT as a foundational tool for next-generation emission inventory development and scenario-based assessment.
IRDec 3, 2021
Towards Low-loss 1-bit Quantization of User-item Representations for Top-K RecommendationYankai Chen, Yifei Zhang, Yingxue Zhang et al.
Due to the promising advantages in space compression and inference acceleration, quantized representation learning for recommender systems has become an emerging research direction recently. As the target is to embed latent features in the discrete embedding space, developing quantization for user-item representations with a few low-precision integers confronts the challenge of high information loss, thus leading to unsatisfactory performance in Top-K recommendation. In this work, we study the problem of representation learning for recommendation with 1-bit quantization. We propose a model named Low-loss Quantized Graph Convolutional Network (L^2Q-GCN). Different from previous work that plugs quantization as the final encoder of user-item embeddings, L^2Q-GCN learns the quantized representations whilst capturing the structural information of user-item interaction graphs at different semantic levels. This achieves the substantial retention of intermediate interactive information, alleviating the feature smoothing issue for ranking caused by numerical quantization. To further improve the model performance, we also present an advanced solution named L^2Q-GCN-anl with quantization approximation and annealing training strategy. We conduct extensive experiments on four benchmarks over Top-K recommendation task. The experimental results show that, with nearly 9x representation storage compression, L^2Q-GCN-anl attains about 90~99% performance recovery compared to the state-of-the-art model.
IRSep 26, 2021
Why Do We Click: Visual Impression-aware News RecommendationJiahao Xun, Shengyu Zhang, Zhou Zhao et al.
There is a soaring interest in the news recommendation research scenario due to the information overload. To accurately capture users' interests, we propose to model multi-modal features, in addition to the news titles that are widely used in existing works, for news recommendation. Besides, existing research pays little attention to the click decision-making process in designing multi-modal modeling modules. In this work, inspired by the fact that users make their click decisions mostly based on the visual impression they perceive when browsing news, we propose to capture such visual impression information with visual-semantic modeling for news recommendation. Specifically, we devise the local impression modeling module to simultaneously attend to decomposed details in the impression when understanding the semantic meaning of news title, which could explicitly get close to the process of users reading news. In addition, we inspect the impression from a global view and take structural information, such as the arrangement of different fields and spatial position of different words on the impression, into the modeling of multiple modalities. To accommodate the research of visual impression-aware news recommendation, we extend the text-dominated news recommendation dataset MIND by adding snapshot impression images and will release it to nourish the research field. Extensive comparisons with the state-of-the-art news recommenders along with the in-depth analyses demonstrate the effectiveness of the proposed method and the promising capability of modeling visual impressions for the content-based recommenders.