Jovan Popovic

h-index88
2papers

2 Papers

CVMar 27, 2024
Garment3DGen: 3D Garment Stylization and Texture Generation

Nikolaos Sarafianos, Tuur Stuyck, Xiaoyu Xiang et al.

We introduce Garment3DGen a new method to synthesize 3D garment assets from a base mesh given a single input image as guidance. Our proposed approach allows users to generate 3D textured clothes based on both real and synthetic images, such as those generated by text prompts. The generated assets can be directly draped and simulated on human bodies. We leverage the recent progress of image-to-3D diffusion methods to generate 3D garment geometries. However, since these geometries cannot be utilized directly for downstream tasks, we propose to use them as pseudo ground-truth and set up a mesh deformation optimization procedure that deforms a base template mesh to match the generated 3D target. Carefully designed losses allow the base mesh to freely deform towards the desired target, yet preserve mesh quality and topology such that they can be simulated. Finally, we generate high-fidelity texture maps that are globally and locally consistent and faithfully capture the input guidance, allowing us to render the generated 3D assets. With Garment3DGen users can generate the simulation-ready 3D garment of their choice without the need of artist intervention. We present a plethora of quantitative and qualitative comparisons on various assets and demonstrate that Garment3DGen unlocks key applications ranging from sketch-to-simulated garments or interacting with the garments in VR. Code is publicly available.

CVJun 27, 2025
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero et al.

Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage from over 4,000 participants in diverse contexts. This dataset enables the development of AI technologies that understand dyadic embodied dynamics, unlocking breakthroughs in virtual agents, telepresence experiences, and multimodal content analysis tools. We also develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech. These models can take as input both the speech and visual behavior of their interlocutors. We present a variant with speech from an LLM model and integrations with 2D and 3D rendering methods, bringing us closer to interactive virtual agents. Additionally, we describe controllable variants of our motion models that can adapt emotional responses and expressivity levels, as well as generating more semantically-relevant gestures. Finally, we discuss methods for assessing the quality of these dyadic motion models, which are demonstrating the potential for more intuitive and responsive human-AI interactions.