Ridvan Aksu

8.6SPSep 2, 2020

American Sign Language Recognition Using RF Sensing

Sevgi Z. Gurbuz, Ali C. Gurbuz, Evie A. Malaia et al.

Many technologies for human-computer interaction have been designed for hearing individuals and depend upon vocalized speech, precluding users of American Sign Language (ASL) in the Deaf community from benefiting from these advancements. While great strides have been made in ASL recognition with video or wearable gloves, the use of video in homes has raised privacy concerns, while wearable gloves severely restrict movement and infringe on daily life. Methods: This paper proposes the use of RF sensors for HCI applications serving the Deaf community. A multi-frequency RF sensor network is used to acquire non-invasive, non-contact measurements of ASL signing irrespective of lighting conditions. The unique patterns of motion present in the RF data due to the micro-Doppler effect are revealed using time-frequency analysis with the Short-Time Fourier Transform. Linguistic properties of RF ASL data are investigated using machine learning (ML). Results: The information content, measured by fractal complexity, of ASL signing is shown to be greater than that of other upper body activities encountered in daily living. This can be used to differentiate daily activities from signing, while features from RF data show that imitation signing by non-signers is 99\% differentiable from native ASL signing. Feature-level fusion of RF sensor network data is used to achieve 72.5\% accuracy in classification of 20 native ASL signs. Implications: RF sensing can be used to study dynamic linguistic properties of ASL and design Deaf-centric smart environments for non-invasive, remote recognition of ASL. ML algorithms should be benchmarked on native, not imitation, ASL data.

2.3MMMar 21, 2018

Viewport-Driven Rate-Distortion Optimized 360° Video Streaming

Jacob Chakareski, Ridvan Aksu, Xavier Corbillon et al.

The growing popularity of virtual and augmented reality communications and 360° video streaming is moving video communication systems into much more dynamic and resource-limited operating settings. The enormous data volume of 360° videos requires an efficient use of network bandwidth to maintain the desired quality of experience for the end user. To this end, we propose a framework for viewport-driven rate-distortion optimized 360° video streaming that integrates the user view navigation pattern and the spatiotemporal rate-distortion characteristics of the 360° video content to maximize the delivered user quality of experience for the given network/system resources. The framework comprises a methodology for constructing dynamic heat maps that capture the likelihood of navigating different spatial segments of a 360° video over time by the user, an analysis and characterization of its spatiotemporal rate-distortion characteristics that leverage preprocessed spatial tilling of the 360° view sphere, and an optimization problem formulation that characterizes the delivered user quality of experience given the user navigation patterns, 360° video encoding decisions, and the available system/network resources. Our experimental results demonstrate the advantages of our framework over the conventional approach of streaming a monolithic uniformly encoded 360° video and a state-of-the-art reference method. Considerable video quality gains of 4 - 5 dB are demonstrated in the case of two popular 4K 360° videos.

Ridvan Aksu

2 Papers