CVAug 8, 2022
Deep Billboards towards Lossless Real2Sim in Virtual RealityNaruya Kondo, So Kuroki, Ryosuke Hyakuta et al.
An aspirational goal for virtual reality (VR) is to bring in a rich diversity of real world objects losslessly. Existing VR applications often convert objects into explicit 3D models with meshes or point clouds, which allow fast interactive rendering but also severely limit its quality and the types of supported objects, fundamentally upper-bounding the "realism" of VR. Inspired by the classic "billboards" technique in gaming, we develop Deep Billboards that model 3D objects implicitly using neural networks, where only 2D image is rendered at a time based on the user's viewing direction. Our system, connecting a commercial VR headset with a server running neural rendering, allows real-time high-resolution simulation of detailed rigid objects, hairy objects, actuated dynamic objects and more in an interactive VR world, drastically narrowing the existing real-to-simulation (real2sim) gap. Additionally, we augment Deep Billboards with physical interaction capability, adapting classic billboards from screen-based games to immersive VR. At our pavilion, the visitors can use our off-the-shelf setup for quickly capturing their favorite objects, and within minutes, experience them in an immersive and interactive VR world with minimal loss of reality. Our project page: https://sites.google.com/view/deepbillboards/
LGJun 6, 2023
Dance Generation by Sound Symbolic WordsMiki Okamura, Naruya Kondo, Tatsuki Fushimi et al.
This study introduces a novel approach to generate dance motions using onomatopoeia as input, with the aim of enhancing creativity and diversity in dance generation. Unlike text and music, onomatopoeia conveys rhythm and meaning through abstract word expressions without constraints on expression and without need for specialized knowledge. We adapt the AI Choreographer framework and employ the Sakamoto system, a feature extraction method for onomatopoeia focusing on phonemes and syllables. Additionally, we present a new dataset of 40 onomatopoeia-dance motion pairs collected through a user survey. Our results demonstrate that the proposed method enables more intuitive dance generation and can create dance motions using sound-symbolic words from a variety of languages, including those without onomatopoeia. This highlights the potential for diverse dance creation across different languages and cultures, accessible to a wider audience. Qualitative samples from our model can be found at: https://sites.google.com/view/onomatopoeia-dance/home/.
HCMay 16
WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AIJingjing Li, Zhi Liu, Xiyao Jin et al.
Cultural heritage exhibitions often struggle to sustain attention and support reflective engagement. Physical exhibitions rely on fixed interpretive aids that lack adaptability to individual backgrounds or curiosity, and their effectiveness depends heavily on a visitor's Personal Context, prior knowledge, and cultural literacy. Meanwhile, digital exhibitions prioritize convenience and accessibility but risk weakening the Physical and Social Contexts that define embodied cultural experience. WhiteTesseract addresses this gap by enabling in-situ interpretation through high-resolution XR and conversational AI. The system integrates spatial intelligence via artwork recognition to allow visitors to selectively reduce environmental distractions (via diminished reality) and engage in context-aware dialogue (via large language models). The goal is to preserve the richness of the physical and social environment while providing a flexible space for personal reflection, enhancing Personal Context without compromising physical authenticity. We deployed the system in a Claude Monet exhibition and conducted a controlled user study with 26 participants. Quantitative results showed that WhiteTesseract modulation significantly increased average viewing duration from 35.3 to 98.3 seconds (p < 0.001). Analysis of 529 visitor-AI interactions revealed that 60% extended beyond factual queries to include analytical, emotional, and comparative inquiries. These findings demonstrate how XR and AI can enrich the physical exhibition experience by supporting deeper, more personalized engagement without displacing the embodied value of cultural heritage. We discuss technical and social constraints for real-world deployment and limitations of our controlled setting.
CVNov 25, 2021Code
VaxNeRF: Revisiting the Classic for Voxel-Accelerated Neural Radiance FieldNaruya Kondo, Yuya Ikeda, Andrea Tagliasacchi et al.
Neural Radiance Field (NeRF) is a popular method in data-driven 3D reconstruction. Given its simplicity and high quality rendering, many NeRF applications are being developed. However, NeRF's big limitation is its slow speed. Many attempts are made to speeding up NeRF training and inference, including intricate code-level optimization and caching, use of sophisticated data structures, and amortization through multi-task and meta learning. In this work, we revisit the basic building blocks of NeRF through the lens of classic techniques before NeRF. We propose Voxel-Accelearated NeRF (VaxNeRF), integrating NeRF with visual hull, a classic 3D reconstruction technique only requiring binary foreground-background pixel labels per image. Visual hull, which can be optimized in about 10 seconds, can provide coarse in-out field separation to omit substantial amounts of network evaluations in NeRF. We provide a clean fully-pythonic, JAX-based implementation on the popular JaxNeRF codebase, consisting of only about 30 lines of code changes and a modular visual hull subroutine, and achieve about 2-8x faster learning on top of the highly-performative JaxNeRF baseline with zero degradation in rendering quality. With sufficient compute, this effectively brings down full NeRF training from hours to 30 minutes. We hope VaxNeRF -- a careful combination of a classic technique with a deep method (that arguably replaced it) -- can empower and accelerate new NeRF extensions and applications, with its simplicity, portability, and reliable performance gains. Codes are available at https://github.com/naruya/VaxNeRF .
HCJan 7, 2024
Expanding Horizons in HCI Research Through LLM-Driven Qualitative AnalysisMaya Grace Torii, Takahito Murakami, Yoichi Ochiai
How would research be like if we still needed to "send" papers typed with a typewriter? Our life and research environment have continually evolved, often accompanied by controversial opinions about new methodologies. In this paper, we embrace this change by introducing a new approach to qualitative analysis in HCI using Large Language Models (LLMs). We detail a method that uses LLMs for qualitative data analysis and present a quantitative framework using SBART cosine similarity for performance evaluation. Our findings indicate that LLMs not only match the efficacy of traditional analysis methods but also offer unique insights. Through a novel dataset and benchmark, we explore LLMs' characteristics in HCI research, suggesting potential avenues for further exploration and application in the field.
LGFeb 10
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 UltraYoichi Ochiai
While real-time image generation using diffusion models has advanced rapidly on NVIDIA GPUs, systematic optimization research on non-CUDA platforms such as Apple Silicon remains extremely limited. In this study, we conducted comprehensive optimization experiments across 10 phases targeting the Apple M3 Ultra (60-core GPU, 512 GB unified memory) with the goal of achieving real-time camera img2img transformation. We explored a wide range of techniques including CoreML conversion, quantization, Token Merging, Neural Engine utilization, compact model exploration, frame interpolation, kNN search-based synthesis, pix2pix-turbo, optical flow frame skipping, and knowledge distillation, quantitatively evaluating the effectiveness of each approach. Ultimately, by combining CoreML conversion of the distillation-specialized model SDXS-512 with a 3-thread camera pipeline, we achieved real-time camera img2img transformation at 22.7 FPS at 512x512 resolution. The primary contribution of this work is the systematic demonstration that optimization insights established for CUDA are not necessarily effective on Apple Silicon's unified memory architecture. We reveal an optimization landscape fundamentally different from that of NVIDIA GPUs -- including the absence of speedup from quantization, the ineffectiveness of parallel inference, and the unsuitability of the Neural Engine for large-scale models -- and provide practical guidelines for diffusion model inference on Apple Silicon.
HCJun 18, 2024
Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi ServicesShengdi Xiao, Jingjing Li, Tatsuki Fushimi et al.
User studies are crucial for meeting user needs. In user studies, real experimental scenarios and participants are constructed and recruited. However, emerging and unfamiliar studies face limitations, including safety concerns and iterative efficiency. To address these challenges, this study utilises a Generative Artificial Intelligence (GenAI) to create GenAI-generated scenarios for user experience (UX). By recruiting real users to evaluate this experience, we can collect feedback that enables rapid iteration in the early design phase. The air taxi is particularly representative of these challenges and has been chosen as the case study for this research. The key contribution was designing an Air Taxi Journey (ATJ) using Large Language Models (LLMs) and AI image and video generators. Based on the GPT-4-generated scripts, key visuals were created for the air taxi, and the ATJ was evaluated by 72 participants. Furthermore, the LLMs demonstrated the ability to identify and suggest environments that significantly improve participants' willingness toward air taxis. Education level and gender significantly influenced participants' the difference in willingness and their satisfaction with the ATJ. Satisfaction with the ATJ serves as a mediator, significantly influencing participants' willingness to take air taxis. Our study confirms the capability of GenAI to support user studies, providing a feasible approach and valuable insights for designing air taxi UX in the early design phase.
HCJan 27, 2021
See-Through Captions: Real-Time Captioning on Transparent Display for Deaf and Hard-of-Hearing PeopleKenta Yamamoto, Ippei Suzuki, Akihisa Shitara et al.
Real-time captioning is a useful technique for deaf and hard-of-hearing (DHH) people to talk to hearing people. With the improvement in device performance and the accuracy of automatic speech recognition (ASR), real-time captioning is becoming an important tool for helping DHH people in their daily lives. To realize higher-quality communication and overcome the limitations of mobile and augmented-reality devices, real-time captioning that can be used comfortably while maintaining nonverbal communication and preventing incorrect recognition is required. Therefore, we propose a real-time captioning system that uses a transparent display. In this system, the captions are presented on both sides of the display to address the problem of incorrect ASR, and the highly transparent display makes it possible to see both the body language and the captions.
SDDec 4, 2020
Acoustic Hologram Optimisation Using Automatic DifferentiationTatsuki Fushimi, Kenta Yamamoto, Yoichi Ochiai
Acoustic holograms are the keystone of modern acoustics. It encodes three-dimensional acoustic fields in two dimensions, and its quality determine the performance of acoustic systems. Optimisation methods that control only the phase of an acoustic wave are considered inferior to methods that control both the amplitude and phase of the wave. In this paper, we present Diff-PAT, an acoustic hologram optimisation algorithm with automatic differentiation. We demonstrate that our method achieves superior accuracy than conventional methods. The performance of Diff-PAT was evaluated by randomly generating 1000 sets of up to 32 control points for single-sided arrays and single-axis arrays. The improved acoustic hologram can be used in wide range of applications of PATs without introducing any changes to existing systems that control the PATs. In addition, we applied Diff-PAT to acoustic metamaterial and achieved an >8 dB increase in the peak noise-to-signal ratio of acoustic hologram.
IVMay 25, 2020
A Preliminary Study for Identification of Additive Manufactured Objects with Transmitted ImagesKenta Yamamoto, Ryota Kawamura, Kazuki Takazawa et al.
Additive manufacturing has the potential to become a standard method for manufacturing products, and product information is indispensable for the item distribution system. While most products are given barcodes to the exterior surfaces, research on embedding barcodes inside products is underway. This is because additive manufacturing makes it possible to carry out manufacturing and information adding at the same time, and embedding information inside does not impair the exterior appearance of the product. However, products that have not been embedded information can not be identified, and embedded information can not be rewritten later. In this study, we have developed a product identification system that does not require embedding barcodes inside. This system uses a transmission image of the product which contains information of each product such as different inner support structures and manufacturing errors. We have shown through experiments that if datasets of transmission images are available, objects can be identified with an accuracy of over 90%. This result suggests that our approach can be useful for identifying objects without embedded information.
HCDec 26, 2019
Discussion of Intelligent Electric Wheelchairs for Caregivers and Care RecipientsSatoshi Hashizume, Ippei Suzuki, Kazuki Takazawa et al.
In order to reduce the burden on caregivers, we developed an intelligent electric wheelchair. We held workshops with caregivers, asked then regarding the problems in caregiving, and developed problem-solving methods. In the workshop, caregivers' physical fitness and psychology of the older adults were found to be problems and a solution was proposed. We implemented a cooperative operation function for multiple electric wheelchairs based on the workshop and demonstrated it at a nursing home. By listening to older adults, we obtained feedback on the automatic driving electric wheelchair. From the results of this study, we discovered the issues and solutions to be applied to the intelligent electric wheelchair.
HCNov 8, 2019
Sonovortex: Aerial Haptic Layer Rendering by Aerodynamic Vortex and Focused UltrasoundSatoshi Hashizume, Amy Koike, Takayuki Hoshi et al.
In this paper, a method of rendering aerial haptics that uses an aerodynamic vortex and focused ultrasound is presented. Significant research has been conducted on haptic applications based on multiple phenomena such as magnetic and electric fields, focused ultrasound, and laser plasma. By combining multiple physical quantities; the resolution, distance, and magnitude of force are enhanced. To combine multiple tactile technologies, basic experiments on resolution and discrimination threshold are required. Separate user studies were conducted using aerodynamic and ultrasonic haptics. Moreover, the perception of their superposition, in addition to their resolution, was tested. Although these fields cause no direct interference, the system enables the simultaneous perception of the tactile feedback of both stimuli. The results of this study are expected to contribute to expanding the expression of aerial haptic displays based on several principles.
HCOct 11, 2017
Air Mounted Eyepiece: Design Methods for Aerial Optical Functions of Near-Eye and See-Through Display using Transmissive Mirror DeviceYoichi Ochiai, Kazuki Otao, Hiroyuki Osone
We propose a novel method to implement an optical see-through head mounted display which renders real aerial images with a wide viewing angle, called an Air Mounted Eyepiece (AME). To achieve the AMD design, we employ an off-the-shelf head mounted display and Transmissive Mirror Device (TMD) which is usually used in aerial real imaging systems. In the proposed method, we replicate the function of the head mounted display (HMD) itself, which is used in the air by using the TMD and presenting a real image of eyepiece in front of the eye. Moreover, it can realize a wide viewing angle 3D display by placing a virtual lens in front of the eye without wearing an HMD. In addition to enhancing the experience of mixed reality and augmented reality, our proposed method can be used as a 3D imaging method for use in other applications such as in automobiles and desktop work. We aim to contribute to the field of human-computer interaction and the research on eyepiece interfaces by discussing the advantages and the limitations of this near-eye optical system.
GRJun 22, 2015
Fairy Lights in Femtoseconds: Aerial and Volumetric Graphics Rendered by Focused Femtosecond Laser Combined with Computational Holographic FieldsYoichi Ochiai, Kota Kumagai, Takayuki Hoshi et al.
We present a method of rendering aerial and volumetric graphics using femtosecond lasers. A high-intensity laser excites a physical matter to emit light at an arbitrary 3D position. Popular applications can then be explored especially since plasma induced by a femtosecond laser is safer than that generated by a nanosecond laser. There are two methods of rendering graphics with a femtosecond laser in air: Producing holograms using spatial light modulation technology, and scanning of a laser beam by a galvano mirror. The holograms and workspace of the system proposed here occupy a volume of up to 1 cm^3; however, this size is scalable depending on the optical devices and their setup. This paper provides details of the principles, system setup, and experimental evaluation, and discussions on scalability, design space, and applications of this system. We tested two laser sources: an adjustable (30-100 fs) laser which projects up to 1,000 pulses per second at energy up to 7 mJ per pulse, and a 269-fs laser which projects up to 200,000 pulses per second at an energy up to 50 uJ per pulse. We confirmed that the spatiotemporal resolution of volumetric displays, implemented with these laser sources, is 4,000 and 200,000 dots per second. Although we focus on laser-induced plasma in air, the discussion presented here is also applicable to other rendering principles such as fluorescence and microbubble in solid/liquid materials.