CRMay 29
LLM Anonymization Against Agentic Re-IdentificatioZiwen Li, Jianing Wen, Tianshi Li
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (\textbf{A}nonymization with \textbf{U}tility-\textbf{R}etention \textbf{A}daptation), an LLM-powered \textit{mask-reconstruct} framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
CRMay 20
PrivacyMotiv: Speculative Persona Journeys for Empathic and Motivating Privacy Reviews in UX DesignZeya Chen, Jianing Wen, Yaxing Yao et al.
UX professionals routinely conduct design reviews, yet privacy concerns are often overlooked, not only due to limited tools, but more fundamentally from low intrinsic motivation, driven by limited privacy knowledge, weak empathy for unexpectedly affected users, and low autonomy in identifying harms. We present PrivacyMotiv, an LLM-powered system that generates vulnerability-centered personas, persona journey stories, and traceable design diagnoses grounded in lo-fi user flows to support privacy-oriented UX design review. In a within-subjects study with professional UX practitioners (N=16), PrivacyMotiv significantly improved empathy, intrinsic motivation, and perceived usefulness, with participants identifying 59% more privacy issues and proposing 70% more redesign solutions compared to self-proposed methods. This work contributes empirical insight into motivational barriers in privacy-aware UX and a structured, narrative-driven approach for integrating privacy review into early-stage UX practice.
CLNov 5, 2025
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive CapacityKaiyuan Zhang, Chenghao Yang, Zhoufutu Wen et al.
As reasoning models scale rapidly, the essential role of multimodality in human cognition has come into sharp relief, driving a growing need to probe vision-centric cognitive behaviors. Yet, existing multimodal benchmarks either overemphasize textual reasoning or fall short of systematically capturing vision-centric cognitive behaviors, leaving the cognitive capacity of MLLMs insufficiently assessed. To address this limitation, we introduce MME-CC (Multi-Modal Evaluation benchmark of Cognitive Capacity), a vision-grounded benchmark that organizes 11 representative reasoning tasks into three fundamental categories of visual information: spatial, geometric, and knowledge-based reasoning, and provides fine-grained analyses of MLLMs' cognitive capacity across these dimensions. Based on MME-CC, we conduct extensive experiments over 16 representative MLLMs. Our study reveals that closed-source models currently lead overall (e.g., 42.66 for Gemini-2.5-Pro vs. 30.45 for GLM-4.5V), while spatial and geometric reasoning remain broadly weak (less than or equal to 30%). We further identify common error patterns, including orientation mistakes, fragile cross-view identity persistence, and poor adherence to counterfactual instructions, and observe that Chain-of-Thought typically follows a three-stage process (extract -> reason -> verify) with heavy reliance on visual extraction. We hope this work catalyzes a shift toward treating the cognitive capacity of MLLMs as central to both evaluation and model design.