David Chuan-En Lin

HC
h-index48
7papers
122citations
Novelty44%
AI Score28

7 Papers

HCNov 22, 2022
VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space

David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee et al. · cmu

Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface to better support the creative and exploratory nature of video editing. We take inspiration from latent space exploration tools that help users find patterns and connections within complex datasets. We present VideoMap, a proof-of-concept video editing interface that operates on video frames projected onto a latent space. We support intuitive navigation through map-inspired navigational elements and facilitate transitioning between different latent spaces through swappable lenses. We built three VideoMap components to support editors in three common video tasks. In a user study with both professionals and non-professionals, editors found that VideoMap helps reduce grunt work, offers a user-friendly experience, provides an inspirational way of editing, and effectively supports the exploratory nature of video editing. We further demonstrate the versatility of VideoMap by implementing three extended applications. For interactive examples, we invite you to visit our project page: https://humanvideointeraction.github.io/videomap.

HCOct 12, 2023
Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

David Chuan-En Lin, Nikolas Martelaro · cmu

Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

CVNov 22, 2022
Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior

David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee et al. · cmu

This paper investigates the challenge of extracting highlight moments from videos. To perform this task, we need to understand what constitutes a highlight for arbitrary video domains while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a technique capable of creating domain-specific highlight videos for a diverse range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with CLIP-based retrieval (which uses a neural network with semantic knowledge of images) can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability.

SDNov 9, 2023
What Do I Hear? Generating Sounds for Visuals with ChatGPT

David Chuan-En Lin, Nikolas Martelaro · cmu

This short paper introduces a workflow for generating realistic soundscapes for visual media. In contrast to prior work, which primarily focus on matching sounds for on-screen visuals, our approach extends to suggesting sounds that may not be immediately visible but are essential to crafting a convincing and immersive auditory environment. Our key insight is leveraging the reasoning capabilities of language models, such as ChatGPT. In this paper, we describe our workflow, which includes creating a scene context, brainstorming sounds, and generating the sounds.

HCJan 30, 2025
Inkspire: Supporting Design Exploration with Generative AI through Analogical Sketching

David Chuan-En Lin, Hyeonsu B. Kang, Nikolas Martelaro et al. · cmu

With recent advancements in the capabilities of Text-to-Image (T2I) AI models, product designers have begun experimenting with them in their work. However, T2I models struggle to interpret abstract language and the current user experience of T2I tools can induce design fixation rather than a more iterative, exploratory process. To address these challenges, we developed Inkspire, a sketch-driven tool that supports designers in prototyping product design concepts with analogical inspirations and a complete sketch-to-design-to-sketch feedback loop. To inform the design of Inkspire, we conducted an exchange session with designers and distilled design goals for improving T2I interactions. In a within-subjects study comparing Inkspire to ControlNet, we found that Inkspire supported designers with more inspiration and exploration of design ideas, and improved aspects of the co-creative process by allowing designers to effectively grasp the current state of the AI to guide it towards novel design intentions.

SDDec 17, 2021
Soundify: Matching Sound Effects to Video

David Chuan-En Lin, Anastasis Germanidis, Cristóbal Valenzuela et al.

In the art of video editing, sound helps add character to an object and immerse the viewer within a space. Through formative interviews with professional editors (N=10), we found that the task of adding sounds to video can be challenging. This paper presents Soundify, a system that assists editors in matching sounds to video. Given a video, Soundify identifies matching sounds, synchronizes the sounds to the video, and dynamically adjusts panning and volume to create spatial audio. In a human evaluation study (N=889), we show that Soundify is capable of matching sounds to video out-of-the-box for a diverse range of audio categories. In a within-subjects expert study (N=12), we demonstrate the usefulness of Soundify in helping video editors match sounds to video with lighter workload, reduced task completion time, and improved usability.

CVMay 30, 2021
Learning Personal Style from Few Examples

David Chuan-En Lin, Nikolas Martelaro

A key task in design work is grasping the client's implicit tastes. Designers often do this based on a set of examples from the client. However, recognizing a common pattern among many intertwining variables such as color, texture, and layout and synthesizing them into a composite preference can be challenging. In this paper, we leverage the pattern recognition capability of computational models to aid in this task. We offer a set of principles for computationally learning personal style. The principles are manifested in PseudoClient, a deep learning framework that learns a computational model for personal graphic design style from only a handful of examples. In several experiments, we found that PseudoClient achieves a 79.40% accuracy with only five positive and negative examples, outperforming several alternative methods. Finally, we discuss how PseudoClient can be utilized as a building block to support the development of future design applications.