Xingchen Xu

AI
h-index18
6papers
25citations
Novelty47%
AI Score43

6 Papers

AIAug 9, 2023
"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

Jin Liu, Xingchen Xu, Xi Nan et al.

Large Language Model (LLM)-based generative AI systems, such as ChatGPT, demonstrate zero-shot learning capabilities across a wide range of downstream tasks. Owing to their general-purpose nature and potential to augment or even automate job functions, these systems are poised to reshape labor market dynamics. However, predicting their precise impact \textit{a priori} is challenging, given AI's simultaneous effects on both demand and supply, as well as the strategic responses of market participants. Leveraging an extensive dataset from a leading online labor platform, we document a pronounced displacement effect and an overall contraction in submarkets where required skills closely align with core LLM functionalities. Although demand and supply both decline, the reduction in supply is comparatively smaller, thereby intensifying competition among freelancers. Notably, further analysis shows that this heightened competition is especially pronounced in programming-intensive submarkets. This pattern is attributed to skill-transition effects: by lowering the human-capital barrier to programming, ChatGPT enables incumbent freelancers to enter programming tasks. Moreover, these transitions are not homogeneous, with high-skilled freelancers contributing disproportionately to the shift. Our findings illuminate the multifaceted impacts of general-purpose AI on labor markets, highlighting not only the displacement of certain occupations but also the inducement of skill transitions within the labor supply. These insights offer practical implications for policymakers, platform operators, and workers.

AISep 25, 2023
Algorithmic Collusion or Competition: the Role of Platforms' Recommender Systems

Xingchen Xu, Stephanie Lee, Yong Tan

Recent scholarly work has extensively examined the phenomenon of algorithmic collusion driven by AI-enabled pricing algorithms. However, online platforms commonly deploy recommender systems that influence how consumers discover and purchase products, thereby shaping the reward structures faced by pricing algorithms and ultimately affecting competition dynamics and equilibrium outcomes. To address this gap in the literature and elucidate the role of recommender systems, we propose a novel repeated game framework that integrates several key components. We first develop a structural search model to characterize consumers' decision-making processes in response to varying recommendation sets. This model incorporates both observable and unobservable heterogeneity in utility and search cost functions, and is estimated using real-world data. Building on the resulting consumer model, we formulate personalized recommendation algorithms designed to maximize either platform revenue or consumer utility. We further introduce pricing algorithms for sellers and integrate all these elements to facilitate comprehensive numerical experiments. Our experimental findings reveal that a revenue-maximizing recommender system intensifies algorithmic collusion, whereas a utility-maximizing recommender system encourages more competitive pricing behavior among sellers. Intriguingly, and contrary to conventional insights from the industrial organization and choice modeling literature, increasing the size of recommendation sets under a utility-maximizing regime does not consistently enhance consumer utility. Moreover, the degree of horizontal differentiation moderates this phenomenon in unexpected ways. The "more is less" effect does not arise at low levels of differentiation, but becomes increasingly pronounced as horizontal differentiation increases.

CLDec 8, 2024Code
7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement

Pu Zhao, Xuan Shen, Zhenglun Kong et al. · harvard

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.

89.9CVMay 11
PhyGround: Benchmarking Physical Reasoning in Generative World Models

Juyi Lin, Arash Akbari, Yumei He et al.

Generative world models are increasingly used for video generation, where learned simulators are expected to capture the physical rules that govern real-world dynamics. However, evaluating whether generated videos actually follow these rules remains challenging. Existing physics-focused video benchmarks have made important progress, but they still face three key challenges, including the coarse evaluation frameworks that hide law-specific failures, response biases and fatigue that undermine the validity of annotation judgments, and automated evaluators that are insufficiently physics-aware or difficult to audit. To address those challenges, we introduce PhyGround, a criteria-grounded benchmark for evaluating physical reasoning in video generation. The benchmark contains 250 curated prompts, each augmented with an expected physical outcome, and a taxonomy of 13 physical laws across solid-body mechanics, fluid dynamics, and optics. Each law is operationalized through observable sub-questions to enable per-law diagnostics. We evaluate eight modern video generation models through a large-scale, quality-controlled human study, grounded on social science lab experiment design. A total of 459 annotators provided 5,796 complete annotations and over 37.4K fine-grained labels; after quality control, the retained annotations exhibited high split-half model-ranking correlations (Spearman's rho > 0.90). To support reproducible automated evaluation, we release PhyJudge-9B, an open physics-specialized VLM judge. PhyJudge-9B achieves substantially lower aggregate relative bias than Gemini-3.1-Pro (3.3% vs. 16.6%). We release prompts, human annotations, model checkpoints, and evaluation code on the project page https://phyground.github.io/.

IRFeb 29, 2024
Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines

Lijia Ma, Xingchen Xu, Yong Tan

In the domain of digital information dissemination, search engines act as pivotal conduits linking information seekers with providers. The advent of chat-based search engines utilizing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG), exemplified by Bing Chat, marks an evolutionary leap in the search ecosystem. They demonstrate metacognitive abilities in interpreting web information and crafting responses with human-like understanding and creativity. Nonetheless, the intricate nature of LLMs renders their "cognitive" processes opaque, challenging even their designers' understanding. This research aims to dissect the mechanisms through which an LLM-powered chat-based search engine, specifically Bing Chat, selects information sources for its responses. To this end, an extensive dataset has been compiled through engagements with New Bing, documenting the websites it cites alongside those listed by the conventional search engine. Employing natural language processing (NLP) techniques, the research reveals that Bing Chat exhibits a preference for content that is not only readable and formally structured, but also demonstrates lower perplexity levels, indicating a unique inclination towards text that is predictable by the underlying LLM. Further enriching our analysis, we procure an additional dataset through interactions with the GPT-4 based knowledge retrieval API, unveiling a congruent text preference between the RAG API and Bing Chat. This consensus suggests that these text preferences intrinsically emerge from the underlying language models, rather than being explicitly crafted by Bing Chat's developers. Moreover, our investigation documents a greater similarity among websites cited by RAG technologies compared to those ranked highest by conventional search engines.

IRSep 17, 2025
When Content is Goliath and Algorithm is David: The Style and Semantic Effects of Generative Search Engine

Lijia Ma, Juan Qin, Xingchen Xu et al.

Generative search engines (GEs) leverage large language models (LLMs) to deliver AI-generated summaries with website citations, establishing novel traffic acquisition channels while fundamentally altering the search engine optimization landscape. To investigate the distinctive characteristics of GEs, we collect data through interactions with Google's generative and conventional search platforms, compiling a dataset of approximately ten thousand websites across both channels. Our empirical analysis reveals that GEs exhibit preferences for citing content characterized by significantly higher predictability for underlying LLMs and greater semantic similarity among selected sources. Through controlled experiments utilizing retrieval augmented generation (RAG) APIs, we demonstrate that these citation preferences emerge from intrinsic LLM tendencies to favor content aligned with their generative expression patterns. Motivated by applications of LLMs to optimize website content, we conduct additional experimentation to explore how LLM-based content polishing by website proprietors alters AI summaries, finding that such polishing paradoxically enhances information diversity within AI summaries. Finally, to assess the user-end impact of LLM-induced information increases, we design a generative search engine and recruit Prolific participants to conduct a randomized controlled experiment involving an information-seeking and writing task. We find that higher-educated users exhibit minimal changes in their final outputs' information diversity but demonstrate significantly reduced task completion time when original sites undergo polishing. Conversely, lower-educated users primarily benefit through enhanced information density in their task outputs while maintaining similar completion times across experimental groups.