Nicholas Rewkowski

HC
6papers
365citations
Novelty37%
AI Score22

6 Papers

CVOct 5, 2021
Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction

Justin Wilson, Nicholas Rewkowski, Ming C. Lin et al.

Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience. The mobile phone prototype emits pulsed audio, while recording video for RGB-based 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, VR demo, and experimental results from real-world and virtual scenes with challenging surfaces and sound indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable visual and audio improvement in 3D scenes (see Figure 1).

MMJul 31, 2021
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski et al.

We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences. We leverage the Mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective cues. We design an affective encoder using multi-scale spatial-temporal graph convolutions to transform 3D pose sequences into latent, pose-based affective features. We use our affective encoder in both our generator, where it learns affective features from the seed poses to guide the gesture synthesis, and our discriminator, where it enforces the synthesized gestures to contain the appropriate affective expressions. We perform extensive evaluations on two benchmark datasets for gesture synthesis from the speech, the TED Gesture Dataset and the GENEA Challenge 2020 Dataset. Compared to the best baselines, we improve the mean absolute joint error by 10--33%, the mean acceleration difference by 8--58%, and the Fréchet Gesture Distance by 21--34%. We also conduct a user study and observe that compared to the best current baselines, around 15.28% of participants indicated our synthesized gestures appear more plausible, and around 16.32% of participants felt the gestures had more appropriate affective expressions aligned with the speech.

HCJan 26, 2021
An Overview of Enhancing Distance Learning Through Augmented and Virtual Reality Technologies

Elizabeth Childs, Ferzam Mohammad, Logan Stevens et al.

Although distance learning presents a number of interesting educational advantages as compared to in-person instruction, it is not without its downsides. We first assess the educational challenges presented by distance learning as a whole and identify 4 main challenges that distance learning currently presents as compared to in-person instruction: the lack of social interaction, reduced student engagement and focus, reduced comprehension and information retention, and the lack of flexible and customizable instructor resources. After assessing each of these challenges in-depth, we examine how AR/VR technologies might serve to address each challenge along with their current shortcomings, and finally outline the further research that is required to fully understand the potential of AR/VR technologies as they apply to distance learning.

HCJan 26, 2021
Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents

Uttaran Bhattacharya, Nicholas Rewkowski, Abhishek Banerjee et al.

We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target virtual agents' intended gender and handedness in our generation pipeline. We train and evaluate our network on the MPI Emotional Body Expressions Database and observe that our network produces state-of-the-art performance in generating gestures for virtual agents aligned with the text for narration or conversation. Our network can generate these gestures at interactive rates on a commodity GPU. We conduct a web-based user study and observe that around 91% of participants indicated our generated gestures to be at least plausible on a five-point Likert Scale. The emotions perceived by the participants from the gestures are also strongly positively correlated with the corresponding intended emotions, with a minimum Pearson coefficient of 0.77 in the valence dimension.

SDFeb 19, 2019
P-Reverb: Perceptual Characterization of Early and Late Reflections for Auditory Displays

Atul Rungta, Nicholas Rewkowski, Roberta Klatzky et al.

We introduce a novel, perceptually derived metric (P-Reverb) that relates the just-noticeable difference (JND) of the early sound field(also called early reflections) to the late sound field (known as late reflections or reverberation). Early and late reflections are crucial components of the sound field and provide multiple perceptual cues for auditory displays. We conduct two extensive user evaluations that relate the JNDs of early reflections and late reverberation in terms of the mean-free path of the environment and present a novel P-Reverb metric. Our metric is used to estimate dynamic reverberation characteristics efficiently in terms of important parameters like reverberation time (RT60). We show the numerical accuracy of our P-Reverb metric in estimating RT60. Finally, we use our metric to design an interactive sound propagation algorithm and demonstrate its effectiveness on various benchmarks.

SDApr 20, 2017
Effects of virtual acoustics on dynamic auditory distance perception

Atul Rungta, Nicholas Rewkowski, Roberta Klatzky et al.

Sound propagation encompasses various acoustic phenomena including reverberation. Current virtual acoustic methods, ranging from parametric filters to physically-accurate solvers, can simulate reverberation with varying degrees of fidelity. We investigate the effects of reverberant sounds generated using different propagation algorithms on acoustic distance perception, i.e., how faraway humans perceive a sound source. In particular, we evaluate two classes of methods for real-time sound propagation in dynamic scenes based on parametric filters and ray tracing. Our study shows that the more accurate method shows less distance compression as compared to the approximate, filter-based method. This suggests that accurate reverberation in VR results in a better reproduction of acoustic distances. We also quantify the levels of distance compression introduced by different propagation methods in a virtual environment.