CLMay 29Code
Fine-Tuning Improves Information Conveyance in Language ModelsYuwei Cheng, Weiyi Tian, Haifeng Xu
Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an entire generation rollout. To address this, we propose Canopy Entropy ($\mathrm{CE}^\star$), a measure that views language generation from a tree perspective, where ``canopy'' represents the space of all possible rollouts, making $\mathrm{CE}^\star$ naturally quantify the effective size of the generation space. $\mathrm{CE}^\star$ jointly captures uncertainty in both the output length $N$ and the generated sequence $Y_{1:N}$ -- indeed, we show that it equals to total Shannon entropy $H(N, Y_{1:N}\mid X)$, where $X$ denotes the prompt. This formulation yields interpretable metrics, including a length-entropy correlation term $ρ(N, r_N)$, where $r_N$ is the entropy rate, quantifying information conveyance efficiency by indicating whether longer outputs are more or less informative per token. Empirically, across tasks and model families, we find that fine-tuned models consistently exhibit stronger positive correlation $ρ(N, r_N)$, even when total entropy decreases. Furthermore, after controlling for model family, task, prompt, and output-length effects, we find that fine-tuning nearly triples the correlation strength between entropy rate and semantic diversity, suggesting that aligned models convert token uncertainty into semantic diversity more efficiently. Overall, these results demonstrate that fine-tuning does not simply reduce uncertainty, but fundamentally reorganizes it into more informative and semantically meaningful generations. Our code is available at https://github.com/WeiyiTian/canopy-entropy.
CVApr 9, 2024
Diffusion-Based Point Cloud Super-Resolution for mmWave Radar DataKai Luan, Chenghao Shi, Neng Wang et al.
The millimeter-wave radar sensor maintains stable performance under adverse environmental conditions, making it a promising solution for all-weather perception tasks, such as outdoor mobile robotics. However, the radar point clouds are relatively sparse and contain massive ghost points, which greatly limits the development of mmWave radar technology. In this paper, we propose a novel point cloud super-resolution approach for 3D mmWave radar data, named Radar-diffusion. Our approach employs the diffusion model defined by mean-reverting stochastic differential equations(SDE). Using our proposed new objective function with supervision from corresponding LiDAR point clouds, our approach efficiently handles radar ghost points and enhances the sparse mmWave radar point clouds to dense LiDAR-like point clouds. We evaluate our approach on two different datasets, and the experimental results show that our method outperforms the state-of-the-art baseline methods in 3D radar super-resolution tasks. Furthermore, we demonstrate that our enhanced radar point cloud is capable of downstream radar point-based registration tasks.
LGMay 18, 2024
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust DuelingYuwei Cheng, Fan Yao, Xuefeng Liu et al.
This paper studies Learning from Imperfect Human Feedback (LIHF), addressing the potential irrationality or imperfect perception when learning from comparative human feedback. Building on evidences that human's imperfection decays over time (i.e., humans learn to improve), we cast this problem as a concave-utility continuous-action dueling bandit but under a restricted form of corruption: i.e., the corruption scale is decaying over time as $t^{ρ-1}$ for some "imperfection rate" $ρ\in [0, 1]$. With $T$ as the total number of iterations, we establish a regret lower bound of $ Ω(\max\{\sqrt{T}, T^ρ\}) $ for LIHF, even when $ρ$ is known. For the same setting, we develop the Robustified Stochastic Mirror Descent for Imperfect Dueling (RoSMID) algorithm, which achieves nearly optimal regret $\tilde{\mathcal{O}}(\max\{\sqrt{T}, T^ρ\})$. Core to our analysis is a novel framework for analyzing gradient-based algorithms for dueling bandit under corruption, and we demonstrate its general applicability by showing how this framework can be easily applied to obtain corruption-robust guarantees for other popular gradient-based dueling bandit algorithms. Our theoretical results are validated by extensive experiments.
HCSep 15, 2025
Can LLMs Address Mental Health Questions? A Comparison with Human TherapistsSynthia Wang, Yuwei Cheng, Austin Song et al.
Limited access to mental health care has motivated the use of digital tools and conversational agents powered by large language models (LLMs), yet their quality and reception remain unclear. We present a study comparing therapist-written responses to those generated by ChatGPT, Gemini, and Llama for real patient questions. Text analysis showed that LLMs produced longer, more readable, and lexically richer responses with a more positive tone, while therapist responses were more often written in the first person. In a survey with 150 users and 23 licensed therapists, participants rated LLM responses as clearer, more respectful, and more supportive than therapist-written answers. Yet, both groups of participants expressed a stronger preference for human therapist support. These findings highlight the promise and limitations of LLMs in mental health, underscoring the need for designs that balance their communicative strengths with concerns of trust, privacy, and accountability.
LGOct 22, 2025
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed RewardsYuwei Cheng, Zifeng Zhao, Haifeng Xu
Online advertising platforms use automated auctions to connect advertisers with potential customers, requiring effective bidding strategies to maximize profits. Accurate ad impact estimation requires considering three key factors: delayed and long-term effects, cumulative ad impacts such as reinforcement or fatigue, and customer heterogeneity. However, these effects are often not jointly addressed in previous studies. To capture these factors, we model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards. For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies, ensuring controlled estimation error based on the first-stage estimator's (in)accuracy. Building on this, we design a reinforcement learning algorithm to derive efficient personalized bidding strategies. This approach achieves a near-optimal regret bound of $\tilde{O}{(dH^2\sqrt{T})}$, where $d$ is the contextual dimension, $H$ is the number of rounds, and $T$ is the number of customers. Our theoretical findings are validated by simulation experiments.
MLOct 18, 2025
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term ConvergenceBingji Yi, Qiyuan Liu, Yuwei Cheng et al.
Synthetic data has been increasingly used to train frontier generative models. However, recent study raises key concerns that iteratively retraining a generative model on its self-generated synthetic data may keep deteriorating model performance, a phenomenon often coined model collapse. In this paper, we investigate ways to modify this synthetic retraining process to avoid model collapse, and even possibly help reverse the trend from collapse to improvement. Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse. To develop principled understandings of the above insight, we situate our analysis in the foundational linear regression setting, showing that iterative retraining with verified synthetic data can yield near-term improvements but ultimately drives the parameter estimate to the verifier's "knowledge center" in the long run. Our theory hence predicts that, unless the verifier is perfectly reliable, the early gains will plateau and may even reverse. Indeed, these theoretical insights are further confirmed by our experiments on both linear regression as well as Variational Autoencoders (VAEs) trained on MNIST data.
ROMar 9, 2021
Are We Ready for Unmanned Surface Vehicles in Inland Waterways? The USVInland Multisensor Dataset and BenchmarkYuwei Cheng, Mengxin Jiang, Jiannan Zhu et al.
Unmanned surface vehicles (USVs) have great value with their ability to execute hazardous and time-consuming missions over water surfaces. Recently, USVs for inland waterways have attracted increasing attention for their potential application in autonomous monitoring, transportation, and cleaning. However, unlike sailing in open water, the challenges posed by scenes of inland waterways, such as the complex distribution of obstacles, the global positioning system (GPS) signal denial environment, the reflection of bank-side structures, and the fog over the water surface, all impede USV application in inland waterways. To address these problems and stimulate relevant research, we introduce USVInland, a multisensor dataset for USVs in inland waterways. The collection of USVInland spans a trajectory of more than 26 km in diverse real-world scenes of inland waterways using various modalities, including lidar, stereo cameras, millimeter-wave radar, GPS, and inertial measurement units (IMUs). Based on the requirements and challenges in the perception and navigation of USVs for inland waterways, we build benchmarks for simultaneous localization and mapping (SLAM), stereo matching, and water segmentation. We evaluate common algorithms for the above tasks to determine the influence of unique inland waterway scenes on algorithm performance. Our dataset and the development tools are available online at https://www.orca-tech.cn/datasets.html.