Sicheng Lu

28.6HCMar 15

Gamifying Compassion: Mitigating Dialect Prejudice Through An AI-Driven Serious Game

Sicheng Lu, Erick Purwanto, Hong Liu et al.

Dialect bias is pervasive yet often unconscious, normalized, or obscured by masking. Existing HCI interventions primarily audit disparities and propose reactive fixes. We present CompassioMate, a dialect-aware serious game that nurtures perspective-taking through AI-mediated play. Players listen to audio samples to identify regional dialects, engage in simulated social interactions involving dialect discrimination, and explore branching narratives that reveal how changes in wording or stance can influence the outcomes. In a three-week field study with 20 university students, participants reported feeling comfortable when observing region-tailored dialogues; several described experiencing perspective change. We contribute: 1) a formative study identifying goals for safe action consequence modelling, 2) the design and evaluation of a serious game integrating dialect audio, region-mapping play, bias; and 3) design implications highlighting listener-side training, transparent evaluation, and narratives maintaining psychological well-being.

59.4CVMar 31

Scaling Video Pretraining for Surgical Foundation Models

Sicheng Lu, Zikai Xiao, Jianhui Wei et al.

Surgical video understanding is essential for computer-assisted interventions, yet existing surgical foundation models remain constrained by limited data scale, procedural diversity, and inconsistent evaluation, often lacking a reproducible training pipeline. We propose SurgRec, a scalable and reproducible pretraining recipe for surgical video understanding, instantiated with two variants: SurgRec-MAE and SurgRec-JEPA. We curate a large multi-source corpus of 10,535 videos and 214.5M frames spanning endoscopy, laparoscopy, cataract, and robotic surgery. Building on this corpus, we develop a unified pretraining pipeline with balanced sampling and standardize a reproducible benchmark across 16 downstream datasets and four clinical domains with consistent data splits. Across extensive comparisons against SSL baselines and vision-language models, SurgRec consistently achieves superior performance across downstream datasets. In contrast, VLMs prove unreliable for fine-grained temporal recognition, exhibiting both performance gaps and sensitivity to prompt phrasing. Our work provides a reproducible, scalable foundation for the community to build more general surgical video models. All code, models, and data will be publicly released.

Sicheng Lu

2 Papers