Ruoyan Kong

IR
h-index11
8papers
34citations
Novelty38%
AI Score28

8 Papers

LGJun 20, 2023
Less Can Be More: Exploring Population Rating Dispositions with Partitioned Models in Recommender Systems

Ruixuan Sun, Ruoyan Kong, Qiao Jin et al.

In this study, we partition users by rating disposition - looking first at their percentage of negative ratings, and then at the general use of the rating scale. We hypothesize that users with different rating dispositions may use the recommender system differently and therefore the agreement with their past ratings may be less predictive of the future agreement. We use data from a large movie rating website to explore whether users should be grouped by disposition, focusing on identifying their various rating distributions that may hurt recommender effectiveness. We find that such partitioning not only improves computational efficiency but also improves top-k performance and predictive accuracy. Though such effects are largest for the user-based KNN CF, smaller for item-based KNN CF, and smallest for latent factor algorithms such as SVD.

HCJun 12, 2023
Getting the Most from Eye-Tracking: User-Interaction Based Reading Region Estimation Dataset and Models

Ruoyan Kong, Ruixuan Sun, Charles Chuankai Zhang et al.

A single digital newsletter usually contains many messages (regions). Users' reading time spent on, and read level (skip/skim/read-in-detail) of each message is important for platforms to understand their users' interests, personalize their contents, and make recommendations. Based on accurate but expensive-to-collect eyetracker-recorded data, we built models that predict per-region reading time based on easy-to-collect Javascript browser tracking data. With eye-tracking, we collected 200k ground-truth datapoints on participants reading news on browsers. Then we trained machine learning and deep learning models to predict message-level reading time based on user interactions like mouse position, scrolling, and clicking. We reached 27\% percentage error in reading time estimation with a two-tower neural network based on user interactions only, against the eye-tracking ground truth data, while the heuristic baselines have around 46\% percentage error. We also discovered the benefits of replacing per-session models with per-timestamp models, and adding user pattern features. We concluded with suggestions on developing message-level reading estimation techniques based on available data.

HCAug 9, 2023
Organizational Bulk Email Systems: Their Role and Performance in Remote Work

Ruoyan Kong, Haiyi Zhu, Joseph A. Konstan

The COVID-19 pandemic has forced many employees to work from home. Organizational bulk emails now play a critical role to reach employees with central information in this work-from-home environment. However, we know from our own recent work that organizational bulk email has problems: recipients fail to retain the bulk messages they received from the organization; recipients and senders have different opinions on which bulk messages were important; and communicators lack technology support to better target and design messages. In this position paper, first we review the prior work on evaluating, designing, and prototyping organizational communication systems. Second we review our recent findings and some research techniques we found useful in studying organizational communication. Last we propose a research agenda to study organizational communications in remote work environment and suggest some key questions and potential study directions.

IRFeb 21, 2023
HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Yunzhong He, Cong Zhang, Ruoyan Kong et al.

Query categorization at customer-to-customer e-commerce platforms like Facebook Marketplace is challenging due to the vagueness of search intent, noise in real-world data, and imbalanced training data across languages. Its deployment also needs to consider challenges in scalability and downstream integration in order to translate modeling advances into better search result relevance. In this paper we present HierCat, the query categorization system at Facebook Marketplace. HierCat addresses these challenges by leveraging multi-task pre-training of dual-encoder architectures with a hierarchical inference step to effectively learn from weakly supervised training data mined from searcher engagement. We show that HierCat not only outperforms popular methods in offline experiments, but also leads to 1.4% improvement in NDCG and 4.3% increase in searcher engagement at Facebook Marketplace Search in online A/B testing.

IRMay 21, 2025
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation

Ruijie Xi, He Ba, Hao Yuan et al.

Embedding-Based Retrieval (EBR) is an important technique in modern search engines, enabling semantic match between search queries and relevant results. However, search logging data on platforms like Facebook Marketplace lacks the diversity and details needed for effective EBR model training, limiting the models' ability to capture nuanced search patterns. To address this challenge, we propose Aug2Search, an EBR-based framework leveraging synthetic data generated by Generative AI (GenAI) models, in a multimodal and multitask approach to optimize query-product relevance. This paper investigates the capabilities of GenAI, particularly Large Language Models (LLMs), in generating high-quality synthetic data, and analyzing its impact on enhancing EBR models. We conducted experiments using eight Llama models and 100 million data points from Facebook Marketplace logs. Our synthetic data generation follows three strategies: (1) generate queries, (2) enhance product listings, and (3) generate queries from enhanced listings. We train EBR models on three different datasets: sampled engagement data or original data ((e.g., "Click" and "Listing Interactions")), synthetic data, and a mixture of both engagement and synthetic data to assess their performance across various training sets. Our findings underscore the robustness of Llama models in producing synthetic queries and listings with high coherence, relevance, and diversity, while maintaining low levels of hallucination. Aug2Search achieves an improvement of up to 4% in ROC_AUC with 100 million synthetic data samples, demonstrating the effectiveness of our approach. Moreover, our experiments reveal that with the same volume of training data, models trained exclusively on synthetic data often outperform those trained on original data only or a mixture of original and synthetic data.

IRMay 17, 2024
The MovieLens Beliefs Dataset: Collecting Pre-Choice Data for Online Recommender Systems

Guy Aridor, Duarte Goncalves, Ruoyan Kong et al.

An increasingly important aspect of designing recommender systems involves considering how recommendations will influence consumer choices. This paper addresses this issue by introducing a method for collecting user beliefs about un-experienced items - a critical predictor of choice behavior. We implemented this method on the MovieLens platform, resulting in a rich dataset that combines user ratings, beliefs, and observed recommendations. We document challenges to such data collection, including selection bias in response and limited coverage of the product space. This unique resource empowers researchers to delve deeper into user behavior and analyze user choices absent recommendations, measure the effectiveness of recommendations, and prototype algorithms that leverage user belief data, ultimately leading to more impactful recommender systems. The dataset can be found at https://grouplens.org/datasets/movielens/ml_belief_2024/.

IRJan 21, 2024
What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders

Ruixuan Sun, Xinyi Wu, Avinash Akella et al.

In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We test how different DL-RecSys models perform in personalized recommendation generation by conducting survey study with 445 real active users. We find some DL-RecSys models to be superior in recommending novel and unexpected items and weaker in diversity, trustworthiness, transparency, accuracy, and overall user satisfaction compared to classic collaborative filtering (CF) methods. To further explain the reasons behind the underperformance, we apply a comprehensive path analysis. We discover that the lack of diversity and too much serendipity from DL models can negatively impact the consequent perceived transparency and personalization of recommendations. Such a path ultimately leads to lower summative user satisfaction. Qualitatively, we confirm with real user quotes that accuracy plus at least one other attribute is necessary to ensure a good user experience, while their demands for transparency and trust can not be neglected. Based on our findings, we discuss future human-centric DL-RecSys design and optimization strategies.

HCJun 30, 2020
Learning to Ignore: A Case Study of Organization-Wide Bulk Email Effectiveness

Ruoyan Kong, Haiyi Zhu, Joseph A. Konstan

Bulk email is a primary communication channel within organizations, with all-company emails and regular newsletters serving as a mechanism for making employees aware of policies and events. Ineffective communication could result in wasted employee time and a lack of compliance or awareness. Previous studies on organizational emails focused mostly on recipients. However, organizational bulk email system is a multi-stakeholder problem including recipients, communicators, and the organization itself. We studied the effectiveness, practice, and assessments of the organizational bulk email system of a large university from multi-stakeholders' perspectives. We conducted a qualitative study with the university's communicators, recipients, and managers. We delved into the organizational bulk email's distributing mechanisms of the communicators, the reading behaviors of recipients, and the perspectives on emails' values of communicators, managers, and recipients. We found that the organizational bulk email system as a whole was strained, and communicators are caught in the middle of this multi-stakeholder problem. First, though the communicators had an interest in preserving the effectiveness of channels in reaching employees, they had high-level clients whose interests might outweigh judgment about whether a message deserves widespread circulation. Second, though communicators thought they were sending important information, recipients viewed most of the organizational bulk emails as not relevant to them. Third, this disagreement was amplified by the success metric used by communicators. They viewed their bulk emails as successful if they had a high open rate. But recipients often opened and then rapidly discarded emails without reading the details. Last, while the communicators in general understood the challenge, they had a limited set of targeting and feedback tools to support their task.