Roy Shilkrot

h-index17

3papers

13citations

Novelty38%

AI Score19

Ranked #187,814 of 194,257 authors (top 97%)#58,196 in CV (top 98%)

3 Papers

3.4CVFeb 19, 2019

BusyHands: A Hand-Tool Interaction Database for Assembly Tasks Semantic Segmentation

Roy Shilkrot, Zhi Chai, Minh Hoai

Visual segmentation has seen tremendous advancement recently with ready solutions for a wide variety of scene types, including human hands and other body parts. However, focus on segmentation of human hands while performing complex tasks, such as manual assembly, is still severely lacking. Segmenting hands from tools, work pieces, background and other body parts is extremely difficult because of self-occlusions and intricate hand grips and poses. In this paper we introduce BusyHands, a large open dataset of pixel-level annotated images of hands performing 13 different tool-based assembly tasks, from both real-world captures and virtual-world renderings. A total of 7906 samples are included in our first-in-kind dataset, with both RGB and depth images as obtained from a Kinect V2 camera and Blender. We evaluate several state-of-the-art semantic segmentation methods on our dataset as a proposed performance benchmark.

5.4HCDec 28, 2018

Enhanced Touchable Projector-depth System with Deep Hand Pose Estimation

Zhi Chai, Roy Shilkrot

Touchable projection with structured light range cameras is a prolific medium for large interaction surfaces, affording multiple simultaneous users and simple, cheap setup. However robust touch detection in such projector-depth systems is difficult to achieve due to measurement noise. We propose a novel combination of surface touch detection and a deep network for hand pose estimation, which aids in detecting both on- and above-surface hand gestures, disambiguating multiple touch fingers, as well as recovering fingertip positions in face of noisy input. We present the details of our GPU-accelerated system and an evaluation of its performance, as well as applications such as an enhanced virtual keyboard that utilizes the added features.

2.9SDDec 9, 2018

Increase Apparent Public Speaking Fluency By Speech Augmentation

Sagnik Das, Nisha Gandhi, Tejas Naik et al.

Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds 'uh' and 'um', the filler words and awkward long silences. Given any unrehearsed speech we segment and silence the filled pauses and doctor the duration of imposed silence as well as other long pauses ('disfluent') by a predictive model learned using professional speech dataset. Finally, we output a audio stream in which speaker sounds more fluent, confident and practiced compared to the original speech he/she recorded. According to our quantitative evaluation, we significantly increase the fluency of speech by reducing rate of pauses and fillers.