NAApr 12, 2016
A surrogate accelerated multicanonical Monte Carlo method for uncertainty quantificationKeyi Wu, Jinglai Li
In this work we consider a class of uncertainty quantification problems where the system performance or reliability is characterized by a scalar parameter $y$. The performance parameter $y$ is random due to the presence of various sources of uncertainty in the system, and our goal is to estimate the probability density function (PDF) of $y$. We propose to use the multicanonical Monte Carlo (MMC) method, a special type of adaptive importance sampling algorithm, to compute the PDF of interest. Moreover, we develop an adaptive algorithm to construct local Gaussian process surrogates to further accelerate the MMC iterations. With numerical examples we demonstrate that the proposed method can achieve several orders of magnitudes of speedup over the standard Monte Carlo method.
91.6LGApr 13
Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy AnalysisYuhang He, Haodong Wu, Siyi Liu et al.
Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning ability of Large Language Models (LLMs). However, its sparse outcome-based rewards pose a fundamental credit assignment problem. We analyze this problem through the joint lens of reward polarity and token entropy. Our diagnostic tool, the Four Quadrant Decomposition, isolates token updates by polarity and entropy, and controlled ablations show that reasoning improvements concentrate in the high-entropy quadrants. To justify this observation theoretically, we adapt Conditional Mutual Information to the autoregressive RLVR setting and prove that the credit a token can carry is upper-bounded by its entropy. This view yields testable predictions that reasoning gains arise primarily from high-entropy tokens, with unique roles for positive and negative updates. A gradient analysis of GRPO further reveals how uniform reward broadcast dilutes signal at high-entropy positions while over-crediting deterministic tokens. Grounded in these insights, we propose Entropy-Aware Policy Optimization (EAPO) that modulates token-level learning signals accordingly. Extensive experiments demonstrate that EAPO outperforms strong baselines across two model families.
HCMay 23, 2023
Automated spacing measurement of formwork system members with 3D point cloud dataKeyi Wu, Samuel A. Prieto, Eyob Mengiste et al.
The formwork system belonging to the temporary structure plays an important role in the smooth progress and successful completion of a construction project. Ensuring that the formwork system is installed as designed is essential for construction safety and quality. The current way to measure the spacing between formwork system members is mostly done using manual measuring tools. This research proposes a framework to measure the spacing of formwork system members using 3D point cloud data to enhance the automation of this quality inspection. The novelty is not only in the integration of the different techniques used but in the detection and measurement of key members in the formwork system without human intervention. The proposed framework was tested on a real construction site. Five cases were investigated to compare the 3D point cloud data approach to the manual approach with traditional measuring tools. The results indicate that the 3D point cloud data approach is a promising solution and can potentially be an effective alternative to the manual approach.