Chengfei Wu

h-index2
2papers

2 Papers

CLNov 15, 2023Code
"We Demand Justice!": Towards Social Context Grounding of Political Texts

Rajkumar Pujari, Chengfei Wu, Dan Goldwasser

Social media discourse frequently consists of 'seemingly similar language used by opposing sides of the political spectrum', often translating to starkly contrasting perspectives. E.g., 'thoughts and prayers', could express sympathy for mass-shooting victims, or criticize the lack of legislative action on the issue. This paper defines the context required to fully understand such ambiguous statements in a computational setting and ground them in real-world entities, actions, and attitudes. We propose two challenging datasets that require an understanding of the real-world context of the text. We benchmark these datasets against models built upon large pre-trained models, such as RoBERTa and GPT-3. Additionally, we develop and benchmark more structured models building upon existing Discourse Contextualization Framework and Political Actor Representation models. We analyze the datasets and the predictions to obtain further insights into the pragmatic language understanding challenges posed by the proposed social grounding tasks.

CVJul 9, 2025
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning

Chengfei Wu, Ronald Seoh, Bingxuan Li et al.

Recent advances in large vision-language models have led to impressive performance in visual question answering and multimodal reasoning. However, it remains unclear whether these models genuinely perform grounded visual reasoning or rely on superficial patterns and dataset biases. In this work, we introduce MagiC, a comprehensive benchmark designed to evaluate grounded multimodal cognition, assessing not only answer accuracy but also the quality of step-by-step reasoning and its alignment with relevant visual evidence. Our benchmark includes approximately 5,500 weakly supervised QA examples generated from strong model outputs and 900 human-curated examples with fine-grained annotations, including answers, rationales, and bounding box groundings. We evaluate 15 vision-language models ranging from 7B to 70B parameters across four dimensions: final answer correctness, reasoning validity, grounding fidelity, and self-correction ability. MagiC further includes diagnostic settings to probe model robustness under adversarial visual cues and assess their capacity for introspective error correction. We introduce new metrics such as MagiScore and StepSense, and provide comprehensive analyses that reveal key limitations and opportunities in current approaches to grounded visual reasoning.