HCMay 21
Sustainable Care: Designing Technologies That Support Children's Long-Term Engagement with Social IssuesJaeWon Kim, Aayushi Dangol, Rotem Landesman et al.
Children today encounter social issues -- climate change, conflict, inequality -- through digital technologies, and the design of that encounter shapes whether young people move toward lasting civic engagement or toward anxiety and withdrawal. Much of the content children see is optimized for attention through fear and urgency, with few pathways toward meaningful action -- contributing to rising distress and disengagement among young people who care deeply but feel powerless to act. This full-day workshop introduces ``sustainable care'' as a design lens, asking how technology might support children's sustained engagement with social causes without contributing to empathic distress or burnout. We invite researchers and practitioners across child-computer interaction, games, education, and youth mental health to map this landscape together and develop a research agenda for the CCI community.
LGSep 26, 2022
Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward EnvironmentsDesik Rengarajan, Sapana Chaudhary, Jaewon Kim et al.
Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information even if sub-optimal to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.
CVJun 1, 2022
Generalized Supervised Contrastive LearningJaewon Kim, Hyukjong Lee, Jooyoung Chang et al.
With the recent promising results of contrastive learning in the self-supervised learning paradigm, supervised contrastive learning has successfully extended these contrastive approaches to supervised contexts, outperforming cross-entropy on various datasets. However, supervised contrastive learning inherently employs label information in a binary form--either positive or negative--using a one-hot target vector. This structure struggles to adapt to methods that exploit label information as a probability distribution, such as CutMix and knowledge distillation. In this paper, we introduce a generalized supervised contrastive loss, which measures cross-entropy between label similarity and latent similarity. This concept enhances the capabilities of supervised contrastive loss by fully utilizing the label distribution and enabling the adaptation of various existing techniques for training modern neural networks. Leveraging this generalized supervised contrastive loss, we construct a tailored framework: the Generalized Supervised Contrastive Learning (GenSCL). Compared to existing contrastive learning frameworks, GenSCL incorporates additional enhancements, including advanced image-based regularization techniques and an arbitrary teacher classifier. When applied to ResNet50 with the Momentum Contrast technique, GenSCL achieves a top-1 accuracy of 77.3% on ImageNet, a 4.1% relative improvement over traditional supervised contrastive learning. Moreover, our method establishes new state-of-the-art accuracies of 98.2% and 87.0% on CIFAR10 and CIFAR100 respectively when applied to ResNet50, marking the highest reported figures for this architecture.
CVMay 19
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QAHyunju Kang, Woohyun Lee, Jaewon Kim et al.
Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually grounded reasoning for better image understanding. However, LMMs lack domain-specific knowledge, which limits their ability to generate accurate responses in complex industrial scenarios. In this work, we present JUDO, Juxtaposed Domain-Oriented Multimodal Reasoner, a framework that efficiently incorporates domain knowledge and context in visual and textual reasoning. Through visual reasoning, our model segments the defect region by juxtaposing query images with normal images as visual domain context, enabling a fine-grained visual comparative inspection. Furthermore, we inject domain knowledge through supervised fine-tuning (SFT) to enhance context understanding and subsequently guide domain reasoning through reinforcement learning (GRPO) with tailored rewards, opting for a domain-oriented reasoning process. Experimental results demonstrate that JUDO achieves superior performance on the MMAD benchmark, surpassing models such as Qwen2.5-VL-7B and GPT-4o. These results highlight the importance of enhancing domain knowledge and context for effective reasoning in anomaly understanding.
HCMay 8
Metaphors as Scaffolds: Spatial, Embodied, Fantastical, and Relational Framings for Youth Usable Privacy DesignJaeWon Kim, Alexis Hiniker
Mainstream usable privacy design frames privacy as administrative work -- settings, toggles, consent checkboxes -- abstracted from the relational, contextual, and embodied registers in which youth reason about disclosure. Drawing on a cross-project reading of three prior studies with youth aged 13--24, we examine how the metaphors that scaffold a privacy interaction shape the reasoning young users bring to it. \textit{Spatial} metaphors reduce cognitive load by recruiting intuitions about navigating physical space. \textit{Embodied} metaphors furnish a shared moral vocabulary that makes implicit norms about public and private space negotiable among users. \textit{Fantastical} metaphors recast privacy management as discoverable play, raising engagement with the granular controls that nuanced self-presentation requires. \textit{Relational} metaphors, by contrast, can lead youth past their own stated boundaries when felt intimacy masks institutional data flow, a risk already visible in AI companion products. Metaphor selection, we argue, is best understood as a first-order ethical design decision for youth privacy.
HCMay 7
The Capacity to Care: Designing Social Technology for Sustained Engagement With Societal ChallengesJaeWon Kim, Lindsay Popowski, Louisa Conwill et al.
People care about climate change, injustice, and humanitarian crises. The challenge is not apathy but capacity: sustained engagement with large-scale problems is psychologically costly, and social media architecture often amplifies awareness while providing few pathways to meaningful action. The result is rising distress, overwhelm, and disengagement -- particularly among young people who encounter global suffering through platforms designed for attention capture rather than constructive response. This workshop examines how social technology design shapes the conditions for sustained engagement with societal challenges. Drawing on Tronto's care ethics framework and research in moral psychology and platform studies, we ask why caring at scale is difficult and how social media can both exacerbate and potentially mitigate this difficulty. Tronto's framework shows that good care requires more than awareness: it demands responsibility, competence, and community. Dominant social media architectures stall the caring process at its earliest phase. We invite researchers and designers to identify platform designs that deplete or support the capacity to care, and to develop design directions for \textit{sustainable care}: engagement that people can maintain over time without burning out.
HCMay 7
Problem Space Attunement in Youth Social Media DesignJaeWon Kim
Social media is central to how young people maintain relationships, develop identity, and access communities, yet dominant platform designs often leave youth feeling constrained rather than supported. My dissertation argues that youth social media design is shaped by three forms of problem-space misattunement. \textit{Conceptual misattunement} occurs when the language of ``social media'' anchors participants to existing platform templates. I address this through Fictional Inquiry in a fictional magic-school setting that helps youth reason from felt relational needs. \textit{Definitional misattunement} occurs when researchers define what ``better'' means on youth's behalf. I address this through a Discord-based asynchronous community that supports youth-led collective inquiry. \textit{Evaluative misattunement} occurs when participants are asked to judge static or hypothetical designs. I address this through an ego-anchored, LLM-agent simulation sandbox. Together, these studies develop youth-grounded criteria and design directions for relationally supportive social media.
HCMay 7
Social Understanding, Placeness, and Identity Alignment: A Design Framework for Friendship-Supportive Youth Social MediaJaeWon Kim, Alexis Hiniker
We present a design framework for friendship-supportive youth social media, derived from a synthesis of five empirical studies with 331 youth participants (ages 13--25) using interviews, co-design, surveys, diary studies, and a field deployment. Iterative analysis of 209 design-relevant data points identified three pillars: \textit{Sense of Social Understanding} (interaction norms, interaction cues and scaffolding, social accountability and governance), \textit{Sense of Place} (third place and community, boundaries and personal spaces, shared presence), and \textit{Sense of Identity Alignment} (identity currency, identity plurality, relational identity signals). The framework maps nine design spaces through which platforms can support the conditions under which youth friendships form, deepen, and are maintained. It offers a shared vocabulary for locating contributions, comparing design interventions, and identifying under-explored areas for future work.
CVNov 23, 2024
Gradient-Free Classifier Guidance for Diffusion Model SamplingRahul Shenoy, Zhihong Pan, Kaushik Balakrishnan et al.
Image generation using diffusion models have demonstrated outstanding learning capabilities, effectively capturing the full distribution of the training dataset. They are known to generate wide variations in sampled images, albeit with a trade-off in image fidelity. Guided sampling methods, such as classifier guidance (CG) and classifier-free guidance (CFG), focus sampling in well-learned high-probability regions to generate images of high fidelity, but each has its limitations. CG is computationally expensive due to the use of back-propagation for classifier gradient descent, while CFG, being gradient-free, is more efficient but compromises class label alignment compared to CG. In this work, we propose an efficient guidance method that fully utilizes a pre-trained classifier without using gradient descent. By using the classifier solely in inference mode, a time-adaptive reference class label and corresponding guidance scale are determined at each time step for guided sampling. Experiments on both class-conditioned and text-to-image generation diffusion models demonstrate that the proposed Gradient-free Classifier Guidance (GFCG) method consistently improves class prediction accuracy. We also show GFCG to be complementary to other guided sampling methods like CFG. When combined with the state-of-the-art Autoguidance (ATG), without additional computational overhead, it enhances image fidelity while preserving diversity. For ImageNet 512$\times$512, we achieve a record $\text{FD}_{\text{DINOv2}}$ of 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
AIMay 21, 2025
Children's Mental Models of AI Reasoning: Implications for AI Literacy EducationAayushi Dangol, Robert Wolfe, Runhua Zhao et al.
As artificial intelligence (AI) advances in reasoning capabilities, most recently with the emergence of Large Reasoning Models (LRMs), understanding how children conceptualize AI's reasoning processes becomes critical for fostering AI literacy. While one of the "Five Big Ideas" in AI education highlights reasoning algorithms as central to AI decision-making, less is known about children's mental models in this area. Through a two-phase approach, consisting of a co-design session with 8 children followed by a field study with 106 children (grades 3-8), we identified three models of AI reasoning: Deductive, Inductive, and Inherent. Our findings reveal that younger children (grades 3-5) often attribute AI's reasoning to inherent intelligence, while older children (grades 6-8) recognize AI as a pattern recognizer. We highlight three tensions that surfaced in children's understanding of AI reasoning and conclude with implications for scaffolding AI curricula and designing explainable AI tools.
CVJan 12, 2024
3D Reconstruction of Interacting Multi-Person in Clothing from a Single ImageJunuk Cha, Hansol Lee, Jaewon Kim et al.
This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We overcome this challenge by utilizing two human priors for complete 3D geometry and surface contacts. For the geometry prior, an encoder learns to regress the image of a person with missing body parts to the latent vectors; a decoder decodes these vectors to produce 3D features of the associated geometry; and an implicit network combines these features with a surface normal map to reconstruct a complete and detailed 3D humans. For the contact prior, we develop an image-space contact detector that outputs a probability distribution of surface contacts between people in 3D. We use these priors to globally refine the body poses, enabling the penetration-free and accurate reconstruction of interacting multi-person in clothing on the scene space. The results demonstrate that our method is complete, globally coherent, and physically plausible compared to existing methods.