Wanqi Zhang

3papers

2citations

Novelty40%

AI Score37

Ranked #116,655 of 201,018 authors (top 58%)#20,956 in CL (top 65%)

3 Papers

CLMar 18

CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Teng Pan, Yuchen Yan, Zixuan Wang et al.

Label-free reinforcement learning enables large language models to improve reasoning capabilities without ground-truth supervision, typically by treating majority-voted answers as pseudo-labels. However, we identify a critical failure mode: as training maximizes self-consistency, output diversity collapses, causing the model to confidently reinforce systematic errors that evade detection. We term this the consensus trap. To escape it, we propose CoVerRL, a framework where a single model alternates between generator and verifier roles, with each capability bootstrapping the other. Majority voting provides noisy but informative supervision for training the verifier, while the improving verifier progressively filters self-consistent errors from pseudo-labels. This co-evolution creates a virtuous cycle that maintains high reward accuracy throughout training. Experiments across Qwen and Llama model families demonstrate that CoVerRL outperforms label-free baselines by 4.7-5.9\% on mathematical reasoning benchmarks. Moreover, self-verification accuracy improves from around 55\% to over 85\%, confirming that both capabilities genuinely co-evolve.

ROMar 30

Why That Robot? A Qualitative Analysis of Justification Strategies for Robot Color Selection Across Occupational Contexts

Jiangen He, Wanqi Zhang, Jessica K. Barfield

As robots increasingly enter the workforce, human-robot interaction (HRI) must address how implicit social biases influence user preferences. This paper investigates how users rationalize their selections of robots varying in skin tone and anthropomorphic features across different occupations. By qualitatively analyzing 4,146 open-ended justifications from 1,038 participants, we map the reasoning frameworks driving robot color selection across four professional contexts. We developed and validated a comprehensive, multidimensional coding scheme via human--AI consensus ($Îº= 0.73$). Our results demonstrate that while utilitarian \textit{Functionalism} is the dominant justification strategy (52\%), participants systematically adapted these practical rationales to align with established racial and occupational stereotypes. Furthermore, we reveal that bias frequently operates beneath conscious rationalization: exposure to racial stereotype primes significantly shifted participants' color choices, yet their spoken justifications remained masked by standard affective or task-related reasoning. We also found that demographic backgrounds significantly shape justification strategies, and that robot shape strongly modulates color interpretation. Specifically, as robots become highly anthropomorphic, users increasingly retreat from functional reasoning toward \textit{Machine-Centric} de-racialization. Through these empirical results, we provide actionable design implications to help reduce the perpetuation of societal biases in future workforce robots.

CVMay 16, 2020

Towards in-store multi-person tracking using head detection and track heatmaps

Aibek Musaev, Jiangping Wang, Liang Zhu et al.

Computer vision algorithms are being implemented across a breadth of industries to enable technological innovations. In this paper, we study the problem of computer vision based customer tracking in retail industry. To this end, we introduce a dataset collected from a camera in an office environment where participants mimic various behaviors of customers in a supermarket. In addition, we describe an illustrative example of the use of this dataset for tracking participants based on a head tracking model in an effort to minimize errors due to occlusion. Furthermore, we propose a model for recognizing customers and staff based on their movement patterns. The model is evaluated using a real-world dataset collected in a supermarket over a 24-hour period that achieves 98% accuracy during training and 93% accuracy during evaluation.