Li-Yi Wei

HC
h-index22
9papers
282citations
Novelty44%
AI Score40

9 Papers

HCAug 28, 2023
Automated Conversion of Music Videos into Lyric Videos

Jiaju Ma, Anyi Rao, Li-Yi Wei et al. · mit

Musicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.

65.9CVMar 25
SemLayer: Semantic-aware Generative Segmentation and Layer Construction for Abstract Icons

Haiyang Xu, Ronghuan Wu, Li-Yi Wei et al.

Graphic icons are a cornerstone of modern design workflows, yet they are often distributed as flattened single-path or compound-path graphics, where the original semantic layering is lost. This absence of semantic decomposition hinders downstream tasks such as editing, restyling, and animation. We formalize this problem as semantic layer construction for flattened vector art and introduce SemLayer, a visual generation empowered pipeline that restores editable layered structures. Given an abstract icon, SemLayer first generates a chromatically differentiated representation in which distinct semantic components become visually separable. To recover the complete geometry of each part, including occluded regions, we then perform a semantic completion step that reconstructs coherent object-level shapes. Finally, the recovered parts are assembled into a layered vector representation with inferred occlusion relationships. Extensive qualitative comparisons and quantitative evaluations demonstrate the effectiveness of SemLayer, enabling editing workflows previously inapplicable to flattened vector graphics and establishing semantic layer reconstruction as a practical and valuable task. Project page: https://xxuhaiyang.github.io/SemLayer/

CVMay 17, 2019Code
Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Yichao Zhou, Haozhi Qi, Yuexiang Zhai et al.

In this paper, we propose a method to obtain a compact and accurate 3D wireframe representation from a single image by effectively exploiting global structural regularities. Our method trains a convolutional neural network to simultaneously detect salient junctions and straight lines, as well as predict their 3D depth and vanishing points. Compared with the state-of-the-art learning-based wireframe detection methods, our network is simpler and more unified, leading to better 2D wireframe detection. With global structural priors from parallelism, our method further reconstructs a full 3D wireframe model, a compact vector representation suitable for a variety of high-level vision tasks such as AR and CAD. We conduct extensive evaluations on a large synthetic dataset of urban scenes as well as real images. Our code and datasets have been made public at https://github.com/zhou13/shapeunity.

HCJan 11, 2024
DrawTalking: Building Interactive Worlds by Sketching and Speaking

Karl Toby Rosenberg, Rubaiat Habib Kazi, Li-Yi Wei et al.

We introduce DrawTalking, an approach to building and controlling interactive worlds by sketching and speaking while telling stories. It emphasizes user control and flexibility, and gives programming-like capability without requiring code. An early open-ended study with our prototype shows that the mechanics resonate and are applicable to many creative-exploratory use cases, with the potential to inspire and inform research in future natural interfaces for creative exploration and authoring.

GRApr 18, 2024
Compositional Neural Textures

Peihan Tu, Li-Yi Wei, Matthias Zwicker

Texture plays a vital role in enhancing visual richness in both real photographs and computer-generated imagery. However, the process of editing textures often involves laborious and repetitive manual adjustments of textons, which are the recurring local patterns that characterize textures. This work introduces a fully unsupervised approach for representing textures using a compositional neural model that captures individual textons. We represent each texton as a 2D Gaussian function whose spatial support approximates its shape, and an associated feature that encodes its detailed appearance. By modeling a texture as a discrete composition of Gaussian textons, the representation offers both expressiveness and ease of editing. Textures can be edited by modifying the compositional Gaussians within the latent space, and new textures can be efficiently synthesized by feeding the modified Gaussians through a generator network in a feed-forward manner. This approach enables a wide range of applications, including transferring appearance from an image texture to another image, diversifying textures,texture interpolation, revealing/modifying texture variations, edit propagation, texture animation, and direct texton manipulation. The proposed approach contributes to advancing texture analysis, modeling, and editing techniques, and opens up new possibilities for creating visually appealing images with controllable textures.

HCJan 10, 2022
Instant Reality: Gaze-Contingent Perceptual Optimization for 3D Virtual Reality Streaming

Shaoyu Chen, Budmonde Duinkharjav, Xin Sun et al.

Media streaming has been adopted for a variety of applications such as entertainment, visualization, and design. Unlike video/audio streaming where the content is usually consumed sequentially, 3D applications such as gaming require streaming 3D assets to facilitate client-side interactions such as object manipulation and viewpoint movement. Compared to audio and video streaming, 3D streaming often requires larger data sizes and yet lower latency to ensure sufficient rendering quality, resolution, and latency for perceptual comfort. Thus, streaming 3D assets can be even more challenging than streaming audios/videos, and existing solutions often suffer from long loading time or limited quality. To address this critical and timely issue, we propose a perceptually-optimized progressive 3D streaming method for spatial quality and temporal consistency in immersive interactions. Based on the human visual mechanisms in the frequency domain, our model selects and schedules the streaming dataset for optimal spatial-temporal quality. We also train a neural network for our model to accelerate this decision process for real-time client-server applications. We evaluate our method via subjective studies and objective analysis under varying network conditions (from 3G to 5G) and client devices (HMD and traditional displays), and demonstrate better visual quality and temporal consistency than alternative solutions.

GRAug 16, 2021
Autocomplete Repetitive Stroking with Image Guidance

Yilan Chen, Kin Chung Kwan, Li-Yi Wei et al.

Image-guided drawing can compensate for the lack of skills but often requires a significant number of repetitive strokes to create textures. Existing automatic stroke synthesis methods are usually limited to predefined styles or require indirect manipulation that may break the spontaneous flow of drawing. We present a method to autocomplete repetitive short strokes during users' normal drawing process. Users can draw over a reference image as usual. At the same time, our system silently analyzes the input strokes and the reference to infer strokes that follow users' input style when certain repetition is detected. Users can accept, modify, or ignore the system predictions and continue drawing, thus maintaining the fluid control of drawing. Our key idea is to jointly analyze image regions and operation history for detecting and predicting repetitions. The proposed system can effectively reduce users' workload in drawing repetitive short strokes and facilitates users in creating results with rich patterns.

HCAug 19, 2020
RealitySketch: Embedding Responsive Graphics and Visualizations in AR through Dynamic Sketching

Ryo Suzuki, Rubaiat Habib Kazi, Li-Yi Wei et al.

We present RealitySketch, an augmented reality interface for sketching interactive graphics and visualizations. In recent years, an increasing number of AR sketching tools enable users to draw and embed sketches in the real world. However, with the current tools, sketched contents are inherently static, floating in mid air without responding to the real world. This paper introduces a new way to embed dynamic and responsive graphics in the real world. In RealitySketch, the user draws graphical elements on a mobile AR screen and binds them with physical objects in real-time and improvisational ways, so that the sketched elements dynamically move with the corresponding physical motion. The user can also quickly visualize and analyze real-world phenomena through responsive graph plots or interactive visualizations. This paper contributes to a set of interaction techniques that enable capturing, parameterizing, and visualizing real-world motion without pre-defined programs and configurations. Finally, we demonstrate our tool with several application scenarios, including physics education, sports training, and in-situ tangible interfaces.

GRMar 30, 2017
Autocomplete 3D Sculpting

Mengqi Peng, Jun Xing, Li-Yi Wei

Digital sculpting is a popular means to create 3D models but remains a challenging task for many users. This can be alleviated by recent advances in data-driven and procedural modeling, albeit bounded by the underlying data and procedures. We propose a 3D sculpting system that assists users in freely creating models without predefined scope. With a brushing interface similar to common sculpting tools, our system silently records and analyzes users' workflows, and predicts what they might or should do in the future to reduce input labor or enhance output quality. Users can accept, ignore, or modify the suggestions and thus maintain full control and individual style. They can also explicitly select and clone past workflows over output model regions. Our key idea is to consider how a model is authored via dynamic workflows in addition to what it is shaped in static geometry, for more accurate analysis of user intentions and more general synthesis of shape structures. The workflows contain potential repetitions for analysis and synthesis, including user inputs (e.g. pen strokes on a pressure sensing tablet), model outputs (e.g. extrusions on an object surface), and camera viewpoints. We evaluate our method via user feedbacks and authored models.