CVSep 18, 2023
Cross-attention-based saliency inference for predicting cancer metastasis on whole slide imagesZiyu Su, Mostafa Rezapour, Usama Sajjad et al.
Although multiple instance learning (MIL) methods are widely used for automatic tumor detection on whole slide images (WSI), they suffer from the extreme class imbalance within the small tumor WSIs. This occurs when the tumor comprises only a few isolated cells. For early detection, it is of utmost importance that MIL algorithms can identify small tumors, even when they are less than 1% of the size of the WSI. Existing studies have attempted to address this issue using attention-based architectures and instance selection-based methodologies, but have not yielded significant improvements. This paper proposes cross-attention-based salient instance inference MIL (CASiiMIL), which involves a novel saliency-informed attention mechanism, to identify breast cancer lymph node micro-metastasis on WSIs without the need for any annotations. Apart from this new attention mechanism, we introduce a negative representation learning algorithm to facilitate the learning of saliency-informed attention weights for improved sensitivity on tumor WSIs. The proposed model outperforms the state-of-the-art MIL methods on two popular tumor metastasis detection datasets, and demonstrates great cross-center generalizability. In addition, it exhibits excellent accuracy in classifying WSIs with small tumor lesions. Moreover, we show that the proposed model has excellent interpretability attributed to the saliency-informed attention weights. We strongly believe that the proposed method will pave the way for training algorithms for early tumor detection on large datasets where acquiring fine-grained annotations is practically impossible.
47.3HCMar 25
Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional OutcomesRenkai Ma, Shuo Niu, Lingyao Li et al.
AI companions enable deep emotional relationships by engaging a user's sense of identity, but they also pose risks like unhealthy emotional dependence. Mitigating these risks requires first understanding the underlying process of identity construction and negotiation with AI companions. Focusing on Character.AI (C.AI), a popular AI companion, we conducted an LLM-assisted thematic analysis of 22,374 online discussions on its subreddit. Using Identity Negotiation Theory as an analytical lens, we identified a three-stage process: 1) five user motivations; 2) an identity negotiation process involving three communication expectations and four identity co-construction strategies; and 3) three emotional outcomes. Our findings surface the identity work users perform as both performers and directors to co-construct identities in negotiation with C.AI. This process takes place within a socio-emotional sandbox where users can experiment with social roles and express emotions without non-human partners. Finally, we offer design implications for emotionally supporting users while mitigating the risks.
HCMar 9, 2024
A Preliminary Exploration of YouTubers' Use of Generative-AI in Content CreationYao Lyu, He Zhang, Shuo Niu et al.
Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being applied, and the methodologies content creators employ with Gen-AI tools during the creation process. This study initially explores this emerging area through a qualitative analysis of 68 YouTube videos demonstrating Gen-AI usage. Our research focuses on identifying the content domains, the variety of tools used, the activities performed, and the nature of the final products generated by Gen-AI in the context of user-generated content.
HCMar 7
Monetizing Generative AI: YouTubers' Collective Knowledge on Earning from Generative AI ContentShuo Niu, Yao Lyu, He Zhang et al.
Generative Artificial Intelligence (GenAI) is reshaping creative labor by enabling the rapid production of text, images, and videos. On YouTube, creators are developing new ways to leverage these tools and share knowledge about how to pursue income through such strategies. However, little is known about what GenAI knowledge has been collectively constructed around monetizing GenAI as a community practice of acting both with and against algorithmically mediated platforms. We analyze 377 YouTube videos in which creators publicly promote workflows, revenue claims, and monetization strategies for GenAI-enabled content. Our analysis identifies ten shared use cases that frame AI-supported income opportunities, and examines how this GenAI knowledge repository embodies a collective effort to leverage platform infrastructures for monetization -- including advertising, direct sales, affiliate marketing, and revenue-sharing models. We further surface structural tensions in AI-mediated creative labor, including unverifiable income claims, content misappropriation, synthetic engagement practices, and shifting authorship norms. We conceptualize creators' collective understanding and adoption of GenAI in the context of monetizing creative labor, with implications for the design of creator-centered GenAI technologies and responsible platform policy.
HCJun 27, 2024
Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on DepressionJiaying Lizzy Liu, Yunlong Wang, Yao Lyu et al.
Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.
HCFeb 14, 2022
Close-up and Whispering: An Understanding of Multimodal and Parasocial Interactions in YouTube ASMR videosShuo Niu, Hugh S. Manon, Ava Bartolome et al.
ASMR (Autonomous Sensory Meridian Response) has grown to immense popularity on YouTube and drawn HCI designers' attention to its effects and applications in design. YouTube ASMR creators incorporate visual elements, sounds, motifs of touching and tasting, and other scenarios in multisensory video interactions to deliver enjoyable and relaxing experiences to their viewers. ASMRtists engage viewers by social, physical, and task attractions. Research has identified the benefits of ASMR in mental wellbeing. However, ASMR remains an understudied phenomenon in the HCI community, constraining designers' ability to incorporate ASMR in video-based designs. This work annotates and analyzes the interaction modalities and parasocial attractions of 2663 videos to identify unique experiences. YouTube comment sections are also analyzed to compare viewers' responses to different ASMR interactions. We find that ASMR videos are experiences of multimodal social connection, relaxing physical intimacy, and sensory-rich activity observation. Design implications are discussed to foster future ASMR-augmented video interactions.
HCJan 11, 2021
#StayHome #WithMe: How Do YouTubers Help with COVID-19 Loneliness?Shuo Niu, Ava Bartolome, Cat Mai et al.
Loneliness threatens public mental wellbeing during COVID-19. In response, YouTube creators participated in the #StayHome #WithMe movement (SHWM) and made myriad videos for people experiencing loneliness or boredom at home. User-shared videos generate parasocial attachment and virtual connectedness. However, there is limited knowledge of how creators contributed videos during disasters to provide social provisions as disaster-relief. Grounded on Weiss's loneliness theory, this work analyzed 1488 SHWM videos to examine video sharing as a pathway to social provisions. Findings suggested that skill and knowledge sharing, entertaining arts, homelife activities, live chatting, and gameplay were the most popular video styles. YouTubers utilized parasocial relationships to form a space for staying away from the disaster. SHWM YouTubers provided friend-like, mentor-like, and family-like provisions through videos in different styles. Family-like provisions led to the highest overall viewer engagement. Based on the findings, design implications for supporting viewers' mental wellbeing in disasters are discussed.
HCNov 8, 2018
Towards Connecting Experiences during Collocated Events through Data Mining and VisualizationShuo Niu, D. Scott McCrickard, Steve Harrison
Themed collocated events, such as conferences, workshops, and seminars, invite people with related life experiences to connect with each other. In this era when people record lives through the Internet, individual experiences exist in different forms of digital contents. People share digital life records during collocated events, such as sharing blogs they wrote, Twitter posts they forwarded, and books they have read. However, connecting experiences during collocated events are challenging. Not only one is blind to the large contents of others, identifying related experiential items depends on how well experiences are retrieved. The collection of personal contents from all participants forms a valuable group repository, from which connections between experiences can be mined. Visualizing same or related experiences inspire conversations and support social exchange. Common topics in group content also help participants generate new perspectives about the collocated group. Advances in machine learning and data visualization provide automated approaches to process large data and enable interactions with data repositories. This position paper promotes the idea of event mining: how to utilize state-of-the-art data processing and visualization techniques to design event mining systems for connecting experiences during collocated activities. We discuss empirical and constructive problems in this design space, and our preliminary study of deploying a tabletop-based system, BlogCloud, which supports experience re-visitation and exchange with machine-learning and data visualization.
HCSep 30, 2018
Tensions on Trails: Understanding Differences between Group and Community Needs in Outdoor SettingsLindah Kotut, Michael Horning, Derek Haqq et al.
This paper compares the needs of groups and communities in outdoor settings, seeking to identify subtle but important differences in the ways that their needs can be supported. We first examine the questions of who uses technology in outdoor settings, what their technological uses and needs are, and what conflicts exist between different trail users regarding technology use and experience. We then consider selected categories of people to understand their distinct needs when acting as groups and as communities. We conclude that it is important to explore the tensions between groups and communities to identify design opportunities.