Victor Chen

CY
h-index32
8papers
109citations
Novelty34%
AI Score41

8 Papers

CYMar 3, 2022
A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions

Francois St-Hilaire, Dung Do Vu, Antoine Frau et al.

Despite artificial intelligence (AI) having transformed major aspects of our society, less than a fraction of its potential has been explored, let alone deployed, for education. AI-powered learning can provide millions of learners with a highly personalized, active and practical learning experience, which is key to successful learning. This is especially relevant in the context of online learning platforms. In this paper, we present the results of a comparative head-to-head study on learning outcomes for two popular online learning platforms (n=199 participants): A MOOC platform following a traditional model delivering content using lecture videos and multiple-choice quizzes, and the Korbit learning platform providing a highly personalized, active and practical learning experience. We observe a huge and statistically significant increase in the learning outcomes, with students on the Korbit platform providing full feedback resulting in higher course completion rates and achieving learning gains 2 to 2.5 times higher than both students on the MOOC platform and students in a control group who don't receive personalized feedback on the Korbit platform. The results demonstrate the tremendous impact that can be achieved with a personalized, active learning AI-powered system. Making this technology and learning experience available to millions of learners around the world will represent a significant leap forward towards the democratization of education.

IRMay 22
Memento: Personalized RAG-Style Long-Retention Data Scaling for META Ads Recommendation

Xiaoyu Chen, Ruichen Wang, Jieming Di et al.

Modeling of long history data suffers from long-context window attention dilution, system efficiency and catastrophic forgetting problems, where naive linear scaling approach like LastN would fail. We introduce Memento, a personalized retrieval-augmented framework that treats historical user engagements as a document corpus and ad requests as queries, retrieving relevant interactions via Maximal Marginal Relevance (MMR) to balance similarity with diversity. We identify two complementary applications: Representation Memento, which retrieves historical embeddings for feature augmentation, and Data Memento, which retrieves past training examples for multipass training. Through infrastructure co-design -- temporal chunking, INT8 quantization, and asynchronous serving -- Memento achieves 5-10$\times$ resource efficiency over linear scaling. Memento processes daily requests with sub-10ms latency, yielding 0.25-0.3% Normalized Entropy gain on both click-through and conversion prediction. In production, Memento delivers a 1% CTR lift on Facebook Feed and Reels and a 1.2% CVR lift, scaling personalization to 365+ days of history.

PLApr 19
&inator: Correct, Precise C-to-Rust Interface Translation

Victor Chen, Ayden Coughlin, Michael D. Bond

Automatically translating system software from C to Rust is an appealing but challenging problem, as it requires whole-program reasoning to satisfy Rust's ownership and borrowing discipline. A key enabling step in whole-program translation is interface translation, which produces Rust declarations for the C program's top-level declarations (i.e., structs and function signatures), enabling modular and incremental code translation. This paper introduces correct, precise C-to-Rust interface translation, called &inator. &inator employs a novel constraint-based formulation of semantic equivalence and type correctness including borrow-checking rules to produce a Rust interface that is correct (i.e., the interface admits a semantics-preserving implementation in safe Rust) and precise (i.e., it uses the simplest, least costly types). Our results show &inator produces correct, precise Rust interfaces for real C programs, but support for certain C features and scaling to large programs are challenges left for future work. This work advances the state of the art by being the first correct, precise approach to C-to-Rust interface translation.

LGNov 20, 2024
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Yoel Zimmermann, Adib Bazgir, Zartashia Afzal et al.

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.

CLJul 11, 2021
Computer-assisted construct classification of organizational performance concerning different stakeholder groups

Seethalakshmi Gopalakrishnan, Victor Chen, Gus Hahn-Powell et al.

The number of research articles in business and management has dramatically increased along with terminology, constructs, and measures. Proper classification of organizational performance constructs from research articles plays an important role in categorizing the literature and understanding to whom its research implications may be relevant. In this work, we classify constructs (i.e., concepts and terminology used to capture different aspects of organizational performance) in research articles into a three-level categorization: (a) performance and non-performance categories (Level 0); (b) for performance constructs, stakeholder group-level of performance concerning investors, customers, employees, and the society (community and natural environment) (Level 1); and (c) for each stakeholder group-level, subcategories of different ways of measurement (Level 2). We observed that increasing contextual information with features extracted from surrounding sentences and external references improves classification of disaggregate-level labels, given limited training data. Our research has implications for computer-assisted construct identification and classification - an essential step for research synthesis.

HCJun 2, 2021
Understanding the Design Space of Mouth Microgestures

Victor Chen, Xuhai Xu, Richard Li et al.

As wearable devices move toward the face (i.e. smart earbuds, glasses), there is an increasing need to facilitate intuitive interactions with these devices. Current sensing techniques can already detect many mouth-based gestures; however, users' preferences of these gestures are not fully understood. In this paper, we investigate the design space and usability of mouth-based microgestures. We first conducted brainstorming sessions (N=16) and compiled an extensive set of 86 user-defined gestures. Then, with an online survey (N=50), we assessed the physical and mental demand of our gesture set and identified a subset of 14 gestures that can be performed easily and naturally. Finally, we conducted a remote Wizard-of-Oz usability study (N=11) mapping gestures to various daily smartphone operations under a sitting and walking context. From these studies, we develop a taxonomy for mouth gestures, finalize a practical gesture set for common applications, and provide design guidelines for future mouth-based gesture interactions.

CYOct 14, 2020
My Team Will Go On: Differentiating High and Low Viability Teams through Team Interaction

Hancheng Cao, Vivian Yang, Victor Chen et al.

Understanding team viability -- a team's capacity for sustained and future success -- is essential for building effective teams. In this study, we aggregate features drawn from the organizational behavior literature to train a viability classification model over a dataset of 669 10-minute text conversations of online teams. We train classifiers to identify teams at the top decile (most viable teams), 50th percentile (above a median split), and bottom decile (least viable teams), then characterize the attributes of teams at each of these viability levels. We find that a lasso regression model achieves an accuracy of .74--.92 AUC ROC under different thresholds of classifying viability scores. From these models, we identify the use of exclusive language such as `but' and `except', and the use of second person pronouns, as the most predictive features for detecting the most viable teams, suggesting that active engagement with others' ideas is a crucial signal of a viable team. Only a small fraction of the 10-minute discussion, as little as 70 seconds, is required for predicting the viability of team interaction. This work suggests opportunities for teams to assess, track, and visualize their own viability in real time as they collaborate.

MEMar 9, 2018
Dimension-Robust MCMC in Bayesian Inverse Problems

Victor Chen, Matthew M. Dunlop, Omiros Papaspiliopoulos et al.

The methodology developed in this article is motivated by a wide range of prediction and uncertainty quantification problems that arise in Statistics, Machine Learning and Applied Mathematics, such as non-parametric regression, multi-class classification and inversion of partial differential equations. One popular formulation of such problems is as Bayesian inverse problems, where a prior distribution is used to regularize inference on a high-dimensional latent state, typically a function or a field. It is common that such priors are non-Gaussian, for example piecewise-constant or heavy-tailed, and/or hierarchical, in the sense of involving a further set of low-dimensional parameters, which, for example, control the scale or smoothness of the latent state. In this formulation prediction and uncertainty quantification relies on efficient exploration of the posterior distribution of latent states and parameters. This article introduces a framework for efficient MCMC sampling in Bayesian inverse problems that capitalizes upon two fundamental ideas in MCMC, non-centred parameterisations of hierarchical models and dimension-robust samplers for latent Gaussian processes. Using a range of diverse applications we showcase that the proposed framework is dimension-robust, that is, the efficiency of the MCMC sampling does not deteriorate as the dimension of the latent state gets higher. We showcase the full potential of the machinery we develop in the article in semi-supervised multi-class classification, where our sampling algorithm is used within an active learning framework to guide the selection of input data to manually label in order to achieve high predictive accuracy with a minimal number of labelled data.