Xingyou Liu

h-index18
2papers

2 Papers

CVJul 2, 2024
Sign Language Recognition Based On Facial Expression and Hand Skeleton

Zhiyu Long, Xingyou Liu, Jiaqi Qiao et al.

Sign language is a visual language used by the deaf and dumb community to communicate. However, for most recognition methods based on monocular cameras, the recognition accuracy is low and the robustness is poor. Even if the effect is good on some data, it may perform poorly in other data with different interference due to the inability to extract effective features. To solve these problems, we propose a sign language recognition network that integrates skeleton features of hands and facial expression. Among this, we propose a hand skeleton feature extraction based on coordinate transformation to describe the shape of the hand more accurately. Moreover, by incorporating facial expression information, the accuracy and robustness of sign language recognition are finally improved, which was verified on A Dataset for Argentinian Sign Language and SEU's Chinese Sign Language Recognition Database (SEUCSLRD).

CVJun 26, 2025
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Xinzhuo Li, Adheesh Juvekar, Xingyou Liu et al.

Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.