34.3CVApr 13
HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language ModelsXinyun Liu
Large vision-language models (LVLMs) achieve strong multimodal performance, but still suffer from hallucinations caused by unstable visual grounding and over-reliance on language priors. Existing training-free decoding methods typically apply calibration at every decoding step, introducing unnecessary computation and potentially disrupting stable predictions. We address this problem by identifying layer-wise hesitation, a simple signal of grounding instability reflected by fluctuations in token preference across intermediate layers. Based on this observation, we propose Hesitation-Triggered Differential Calibration (HTDC), a training-free decoding framework that preserves standard full-branch inference and activates calibration only at hesitation-prone steps. When triggered, HTDC contrasts the full branch with two lightweight probes, a visual-nullification probe and a semantic-nullification probe, to suppress hallucination-prone candidates while avoiding unnecessary intervention on stable steps. Experiments on representative hallucination benchmarks show that HTDC consistently reduces hallucinations while maintaining strong task accuracy, achieving a favorable trade-off between effectiveness and computational overhead.
HCFeb 14, 2025
Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language ModelsJiexin Ding, Bowen Zhao, Yuntao Wang et al.
English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.
LGJun 1, 2025
Generalizable LLM Learning of Graph Synthetic Data with Post-training AlignmentYizhuo Zhang, Heng Wang, Shangbin Feng et al. · tsinghua, uw
Previous research has sought to enhance the graph reasoning capabilities of LLMs by supervised fine-tuning on synthetic graph data. While these led to specialized LLMs better at solving graph algorithm problems, we don't need LLMs for shortest path: we need generalization from synthetic graph data to real-world tasks with implicit graph structures. In this work, we propose to unlock generalizable learning of graph with post-training alignment with synthetic data. We first design solution-based and process-based rewards for synthetic graph problems: instead of rigid memorizing response patterns in direct fine-tuning, we posit that post-training alignment would help LLMs grasp the essentials underlying graph reasoning and alleviate overfitting on synthetic data. We employ post-training alignment algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data. We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more. Extensive experiments demonstrate that our post-training alignment recipe leads to statistically significant improvement on 5 datasets, with an average gain of 12.9% over baseline settings. Further analysis reveals that process-based rewards consistently outperform solution-based rewards on synthetic data but not on real-world tasks, and compositionality and explainable intermediate steps remains a critical challenge even after post-training alignment.
IVJun 13, 2024
AGFA-Net: Attention-Guided and Feature-Aggregated Network for Coronary Artery Segmentation using Computed Tomography AngiographyXinyun Liu, Chen Zhao
Coronary artery disease (CAD) remains a prevalent cardiovascular condition, posing significant health risks worldwide. This pathology, characterized by plaque accumulation in coronary artery walls, leads to myocardial ischemia and various symptoms, including chest pain and shortness of breath. Accurate segmentation of coronary arteries from coronary computed tomography angiography (CCTA) images is crucial for diagnosis and treatment planning. Traditional segmentation methods face challenges in handling low-contrast images and complex anatomical structures. In this study, we propose an attention-guided, feature-aggregated 3D deep network (AGFA-Net) for coronary artery segmentation using CCTA images. AGFA-Net leverages attention mechanisms and feature refinement modules to capture salient features and enhance segmentation accuracy. Evaluation on a dataset comprising 1,000 CCTA scans demonstrates AGFA-Net's superior performance, achieving an average Dice coefficient similarity of 86.74% and a Hausdorff distance of 0.23 mm during 5-fold cross-validation. Ablation studies further validate the effectiveness of the proposed modules, highlighting their contributions to improved segmentation accuracy. Overall, AGFA-Net offers a robust and reliable solution for coronary artery segmentation, addressing challenges posed by varying vessel sizes, complex anatomies, and low image contrast.