IRSep 1, 2025
CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce MarketsYujing Wang, Yiren Chen, Huoran Li et al.
As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
HCJan 12, 2018
Predicting Smartphone Battery Life based on Comprehensive and Real-time Usage DataHuoran Li, Xuanzhe Liu, Qiaozhu Mei
Smartphones and smartphone apps have undergone an explosive growth in the past decade. However, smartphone battery technology hasn't been able to keep pace with the rapid growth of the capacity and the functionality of smartphones and apps. As a result, battery has always been a bottleneck of a user's daily experience of smartphones. An accurate estimation of the remaining battery life could tremendously help the user to schedule their activities and use their smartphones more efficiently. Existing studies on battery life prediction have been primitive due to the lack of real-world smartphone usage data at scale. This paper presents a novel method that uses the state-of-the-art machine learning models for battery life prediction, based on comprehensive and real-time usage traces collected from smartphones. The proposed method is the first that identifies and addresses the severe data missing problem in this context, using a principled statistical metric called the concordance index. The method is evaluated using a dataset collected from 51 users for 21 months, which covers comprehensive and fine-grained smartphone usage traces including system status, sensor indicators, system events, and app status. We find that the remaining battery life of a smartphone can be accurately predicted based on how the user uses the device at the real-time, in the current session, and in history. The machine learning models successfully identify predictive features for battery life and their applicable scenarios.
SEJul 27, 2017
Mining Device-Specific Apps Usage Patterns from Large-Scale Android UsersHuoran Li, Xuan Lu
When smartphones, applications (a.k.a, apps), and app stores have been widely adopted by the billions, an interesting debate emerges: whether and to what extent do device models influence the behaviors of their users? The answer to this question is critical to almost every stakeholder in the smartphone app ecosystem, including app store operators, developers, end-users, and network providers. To approach this question, we collect a longitudinal data set of app usage through a leading Android app store in China, called Wandoujia. The data set covers the detailed behavioral profiles of 0.7 million (761,262) unique users who use 500 popular types of Android devices and about 0.2 million (228,144) apps, including their app management activities, daily network access time, and network traffic of apps. We present a comprehensive study on investigating how the choices of device models affect user behaviors such as the adoption of app stores, app selection and abandonment, data plan usage, online time length, the tendency to use paid/free apps, and the preferences to choosing competing apps. Some significant correlations between device models and app usage are derived, leading to important findings on the various user behaviors. For example, users owning different device models have a substantial diversity of selecting competing apps, and users owning lower-end devices spend more money to purchase apps and spend more time under cellular network.
HCMay 16, 2017
Through a Gender Lens: Learning Usage Patterns of Emojis from Large-Scale Android UsersZhenpeng Chen, Xuan Lu, Wei Ai et al.
Based on a large data set of emoji using behavior collected from smartphone users over the world, this paper investigates gender-specific usage of emojis. We present various interesting findings that evidence a considerable difference in emoji usage by female and male users. Such a difference is significant not just in a statistical sense; it is sufficient for a machine learning algorithm to accurately infer the gender of a user purely based on the emojis used in their messages. In real world scenarios where gender inference is a necessity, models based on emojis have unique advantages over existing models that are based on textual or contextual information. Emojis not only provide language-independent indicators, but also alleviate the risk of leaking private user information through the analysis of text and metadata.
CYFeb 14, 2017
Mining Behavioral Patterns from Millions of Android UsersXuanzhe Liu, Huoran Li, Xuan Lu et al.
The prevalence of smart mobile devices has promoted the popularity of mobile applications (a.k.a. apps). Supporting mobility has become a promising trend in software engineering research. This article presents an empirical study of behavioral service profiles collected from millions of users whose devices are deployed with Wandoujia, a leading Android app store service in China. The dataset of Wandoujia service profiles consists of two kinds of user behavioral data from using 0.28 million free Android apps, including (1) app management activities (i.e., downloading, updating, and uninstalling apps) from over 17 million unique users and (2) app network usage from over 6 million unique users. We explore multiple aspects of such behavioral data and present patterns of app usage. Based on the findings as well as derived knowledge, we also suggest some new open opportunities and challenges that can be explored by the research community, including app development, deployment, delivery, revenue, etc.