CVApr 3, 2024Code
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural StylizationSeungJeh Chung, JooHyun Park, HyeongYeop Kang
3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential by enabling the creation of uniquely styled 3D objects tailored to diverse scenes. Recent advancements in artificial intelligence and text-driven manipulation methods have made the stylization process increasingly intuitive and automated. While these methods reduce human costs by minimizing reliance on manual labor and expertise, they predominantly focus on holistic stylization, neglecting the application of desired styles to individual components of a 3D object. This limitation restricts the fine-grained controllability. To address this gap, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, part-tailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP utilizes the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize individual parts of the 3D mesh and modify their appearance to match the styles specified in the text prompt. 3DStyleGLIP effectively integrates part localization and stylization guidance within GLIP's shared embedding space through an end-to-end process, enabled by part-level style loss and two complementary learning techniques. This neural methodology meets the user's need for fine-grained style editing and delivers high-quality part-specific stylization results, opening new possibilities for customization and flexibility in 3D content creation. Our code and results are available at https://github.com/sj978/3DStyleGLIP.
CVMar 29, 2024Code
P-Hologen: An End-to-End Generative Framework for Phase-Only HologramsJooHyun Park, YuJin Jeon, HuiYong Kim et al.
Holography stands at the forefront of visual technology, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Although generative models have been extensively explored in the image domain, their application to holograms remains relatively underexplored due to the inherent complexity of phase learning. Exploiting generative models for holograms offers exciting opportunities for advancing innovation and creativity, such as semantic-aware hologram generation and editing. Currently, the most viable approach for utilizing generative models in the hologram domain involves integrating an image-based generative model with an image-to-hologram conversion model, which comes at the cost of increased computational complexity and inefficiency. To tackle this problem, we introduce P-Hologen, the first end-to-end generative framework designed for phase-only holograms (POHs). P-Hologen employs vector quantized variational autoencoders to capture the complex distributions of POHs. It also integrates the angular spectrum method into the training process, constructing latent spaces for complex phase data using strategies from the image processing domain. Extensive experiments demonstrate that P-Hologen achieves superior quality and computational efficiency compared to the existing methods. Furthermore, our model generates high-quality unseen, diverse holographic content from its learned latent space without requiring pre-existing images. Our work paves the way for new applications and methodologies in holographic content creation, opening a new era in the exploration of generative holographic content. The code for our paper is publicly available on https://github.com/james0223/P-Hologen.
AIMar 15
RenderMem: Rendering as Spatial Memory RetrievalJooHyun Park, HyeongYeop Kang
Embodied reasoning is inherently viewpoint-dependent: what is visible, occluded, or reachable depends critically on where the agent stands. However, existing spatial memory systems for embodied agents typically store either multi-view observations or object-centric abstractions, making it difficult to perform reasoning with explicit geometric grounding. We introduce RenderMem, a spatial memory framework that treats rendering as the interface between 3D world representations and spatial reasoning. Instead of storing fixed observations, RenderMem maintains a 3D scene representation and generates query-conditioned visual evidence by rendering the scene from viewpoints implied by the query. This enables embodied agents to reason directly about line-of-sight, visibility, and occlusion from arbitrary perspectives. RenderMem is fully compatible with existing vision-language models and requires no modification to standard architectures. Experiments in the AI2-THOR environment show consistent improvements on viewpoint-dependent visibility and occlusion queries over prior memory baselines.