60.7AIMay 28
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training TracesChen He, Yuhao Wu, Lei Wang et al.
Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears sufficiently supported, but the trace continues with additional reasoning that remains in the supervised target. To test its training effect, we use a delete-only editor to construct answer-preserving suffix removal and compare CoT-based SFT on the original and processed traces. We observe improved SFT outcomes after removing the editor-identified post-conclusion continuation, suggesting that this continuation is harmful to training in our setting. We therefore refer to this empirically supported phenomenon as harmful continuation. Beyond this intervention, we further characterize the removed post-conclusion continuation through uncertainty and hidden-state progress. We observe persistent local uncertainty together with weakened terminal-directional progress, forming an uncertainty--geometry mismatch. Finally, we instantiate Harmful Continuation Cut (HCC), a lightweight boundary proxy that approximates the editor-identified post-conclusion continuation boundary.
65.0LGMay 26
Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly DetectionYuxuan Yin, Chen He, Todd Jacobs et al.
Latent defect screening is challenged by extremely low failure rates, high-dimensional test data, and absence of labeled anomalies. We propose the first unsupervised anomaly detection framework incorporating a Diffusion Transformer. Raw test measurements are first compressed by an autoencoder, then reshaped into a structured token sequence enriched with sinusoidal and per-device wafer-position embeddings. Anomaly scores are derived from the noise-prediction error over mid-range diffusion timesteps, enabling fast wafer-scale screening without any labeled defects or manual feature engineering. Our approach achieves state-of-the-art performance on industrial 16nm IC test data under extreme class imbalance, offering interpretable failure localization through latent-space reconstruction residuals.
AINov 9, 2025
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language ModelsChen He, Xun Jiang, Lei Wang et al.
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of reasoning tasks. Recent methods have further improved LLM performance in complex mathematical reasoning. However, when extending these methods beyond the domain of mathematical reasoning to tasks involving complex domain-specific knowledge, we observe a consistent failure of LLMs to generate novel insights during the reflection stage. Instead of conducting genuine cognitive refinement, the model tends to mechanically reiterate earlier reasoning steps without introducing new information or perspectives, a phenomenon referred to as "Echo Reflection". We attribute this behavior to two key defects: (1) Uncontrollable information flow during response generation, which allows premature intermediate thoughts to propagate unchecked and distort final decisions; (2) Insufficient exploration of internal knowledge during reflection, leading to repeating earlier findings rather than generating new cognitive insights. Building on these findings, we proposed a novel reinforcement learning method termed Adaptive Entropy Policy Optimization (AEPO). Specifically, the AEPO framework consists of two major components: (1) Reflection-aware Information Filtration, which quantifies the cognitive information flow and prevents the final answer from being affected by earlier bad cognitive information; (2) Adaptive-Entropy Optimization, which dynamically balances exploration and exploitation across different reasoning stages, promoting both reflective diversity and answer correctness. Extensive experiments demonstrate that AEPO consistently achieves state-of-the-art performance over mainstream reinforcement learning baselines across diverse benchmarks.
ROApr 15, 2024Code
DIDLM: A SLAM Dataset for Difficult Scenarios Featuring Infrared, Depth Cameras, LIDAR, 4D Radar, and Others under Adverse Weather, Low Light Conditions, and Rough RoadsWeisheng Gong, Kaijie Su, Qingyong Li et al.
Adverse weather conditions, low-light environments, and bumpy road surfaces pose significant challenges to SLAM in robotic navigation and autonomous driving. Existing datasets in this field predominantly rely on single sensors or combinations of LiDAR, cameras, and IMUs. However, 4D millimeter-wave radar demonstrates robustness in adverse weather, infrared cameras excel in capturing details under low-light conditions, and depth images provide richer spatial information. Multi-sensor fusion methods also show potential for better adaptation to bumpy roads. Despite some SLAM studies incorporating these sensors and conditions, there remains a lack of comprehensive datasets addressing low-light environments and bumpy road conditions, or featuring a sufficiently diverse range of sensor data. In this study, we introduce a multi-sensor dataset covering challenging scenarios such as snowy weather, rainy weather, nighttime conditions, speed bumps, and rough terrains. The dataset includes rarely utilized sensors for extreme conditions, such as 4D millimeter-wave radar, infrared cameras, and depth cameras, alongside 3D LiDAR, RGB cameras, GPS, and IMU. It supports both autonomous driving and ground robot applications and provides reliable GPS/INS ground truth data, covering structured and semi-structured terrains. We evaluated various SLAM algorithms using this dataset, including RGB images, infrared images, depth images, LiDAR, and 4D millimeter-wave radar. The dataset spans a total of 18.5 km, 69 minutes, and approximately 660 GB, offering a valuable resource for advancing SLAM research under complex and extreme conditions. Our dataset is available at https://github.com/GongWeiSheng/DIDLM.
SYMay 3, 2024
Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile RegressionYuxuan Yin, Xiaoxiao Wang, Rebecca Chen et al.
Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free $V_{min}$ interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for $V_{min}$ prediction.
LGAug 21, 2025
Transfer Learning for Minimum Operating Voltage Prediction in Advanced Technology Nodes: Leveraging Legacy Data and Silicon Odometer SensingYuxuan Yin, Rebecca Chen, Boxun Xu et al.
Accurate prediction of chip performance is critical for ensuring energy efficiency and reliability in semiconductor manufacturing. However, developing minimum operating voltage ($V_{min}$) prediction models at advanced technology nodes is challenging due to limited training data and the complex relationship between process variations and $V_{min}$. To address these issues, we propose a novel transfer learning framework that leverages abundant legacy data from the 16nm technology node to enable accurate $V_{min}$ prediction at the advanced 5nm node. A key innovation of our approach is the integration of input features derived from on-chip silicon odometer sensor data, which provide fine-grained characterization of localized process variations -- an essential factor at the 5nm node -- resulting in significantly improved prediction accuracy.
CVAug 3, 2025
GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video RetrievalBowen Yang, Yun Cao, Chen He et al.
Text-to-video retrieval requires precise alignment between language and temporally rich audio-video signals. However, existing methods often emphasize visual cues while underutilizing audio semantics or relying on coarse fusion strategies, resulting in suboptimal multimodal representations. We introduce GAIS, a retrieval framework that strengthens multimodal alignment from both representation and regularization perspectives. First, a Frame-level Gated Fusion (FGF) module adaptively integrates audio-visual features under textual guidance, enabling fine-grained temporal selection of informative frames. Second, a Semantic Variance-Scaled Perturbation (SVSP) mechanism regularizes the text embedding space by controlling perturbation magnitude in a semantics-aware manner. These two modules are complementary: FGF minimizes modality gaps through selective fusion, while SVSP improves embedding stability and discrimination. Extensive experiments on MSR-VTT, DiDeMo, LSMDC, and VATEX demonstrate that GAIS consistently outperforms strong baselines across multiple retrieval metrics while maintaining notable computational efficiency.
CVJul 30, 2025
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group AttentionYuqi Pang, Bowen Yang, Yun Cao et al.
Vision large language models (VLLMs) are focusing primarily on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models. However, these approaches face high training and inference costs, as well as challenges in extracting visual details, effectively bridging across modalities. In this work, we propose a novel visual framework, MoCHA, to address these issues. Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module to dynamically select experts tailored to different visual dimensions. To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we further design a Hierarchical Group Attention (HGA) with intra- and inter-group operations and an adaptive gating strategy for encoded visual features. We train MoCHA on two mainstream LLMs (e.g., Phi2-2.7B and Vicuna-7B) and evaluate their performance across various benchmarks. Notably, MoCHA outperforms state-of-the-art open-weight models on various tasks. For example, compared to CuMo (Mistral-7B), our MoCHA (Phi2-2.7B) presents outstanding abilities to mitigate hallucination by showing improvements of 3.25% in POPE and to follow visual instructions by raising 153 points on MME. Finally, ablation studies further confirm the effectiveness and robustness of the proposed MoECs and HGA in improving the overall performance of MoCHA.
HCAug 2, 2021
Interactive Visual Facets to Support Fluid Exploratory SearchChen He, Luana Micallef, Barış Serim et al.
Exploratory search starts with ill-defined goals and involves browsing, learning, and formulating new targets for search. To fluidly support such dynamic search behaviours, we focus on devising interactive visual facets (IVF), visualising information facets to support user comprehension and control of the information space. To do this, we reviewed existing faceted search interfaces and derived two design requirements (DR) that have not been fully addressed to support fluid interactions in exploratory search. We then exemplified the requirements through devising an IVF tool, which coordinates a linear and a categorical facet representing the distribution and summarisation of items, respectively, and providing context for faceted exploration (DR1). To support rapid transitions between search criteria (DR2), the tool introduces a novel design concept of using facets to select items without filtering the item space. Particularly, we propose a filter-swipe technique that enables users to drag a categorical facet value sequentially over linear facet bars to view the items in the intersection of the two facets along with the categorical facet dynamically summarizing the items in the interaction. Three applications demonstrate how the features support information discovery with ease. A user study of 11 participants with realistic email search tasks shows that dynamic suggestions through the timeline navigation can help discover useful suggestions for search; the novel design concept was favoured over using facet values as filters. Based on these practices, we derive IVF design implications for fluid, exploratory searches.
HCOct 12, 2020
Characterizing the Quality of Insight by Interactions: A Case StudyChen He, Luana Micallef, Liye He et al.
Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This paper presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool-MediSyn-for insight generation. MediSyn supports five types of interactions: selecting, connecting, elaborating, exploring, and sharing. We evaluated MediSyn with 14 participants by allowing them to freely explore the data and generate insights. We then extracted seven interaction patterns from their interaction logs and correlated the patterns to four aspects of insight quality. The results show the possibility of qualifying insights via interactions. Among other findings, exploration actions can lead to unexpected insights; the drill-down pattern tends to increase the domain values of insights. A qualitative analysis shows that using domain knowledge to guide exploration can positively affect the domain value of derived insights. We discuss the study's implications, lessons learned, and future research opportunities.
LGJul 29, 2019
A Deep Learning Based Attack for The Chaos-based Image EncryptionChen He, Kan Ming, Yongwei Wang et al.
In this letter, as a proof of concept, we propose a deep learning-based approach to attack the chaos-based image encryption algorithm in \cite{guan2005chaos}. The proposed method first projects the chaos-based encrypted images into the low-dimensional feature space, where essential information of plain images has been largely preserved. With the low-dimensional features, a deconvolutional generator is utilized to regenerate perceptually similar decrypted images to approximate the plain images in the high-dimensional space. Compared with conventional image encryption attack algorithms, the proposed method does not require to manually analyze and infer keys in a time-consuming way. Instead, we directly attack the chaos-based encryption algorithms in a key-independent manner. Moreover, the proposed method can be trained end-to-end. Given the chaos-based encrypted images, a well-trained decryption model is able to automatically reconstruct plain images with high fidelity. In the experiments, we successfully attack the chaos-based algorithm \cite{guan2005chaos} and the decrypted images are visually similar to their ground truth plain images. Experimental results on both static-key and dynamic-key scenarios verify the efficacy of the proposed method.