Lokesh Kumar

2papers

2 Papers

58.7ASMar 20
Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?

Lokesh Kumar, Nirmesh Shah, Ashishkumar P. Gudmalwar et al.

Human communication seamlessly integrates speech and bodily motion, where hand gestures naturally complement vocal prosody to express intent, emotion, and emphasis. While recent text-to-speech (TTS) systems have begun incorporating multimodal cues such as facial expressions or lip movements, the role of hand gestures in shaping prosody remains largely underexplored. We propose a novel multimodal TTS framework, Gesture2Speech, that leverages visual gesture cues to modulate prosody in synthesized speech. Motivated by the observation that confident and expressive speakers coordinate gestures with vocal prosody, we introduce a multimodal Mixture-of-Experts (MoE) architecture that dynamically fuses linguistic content and gesture features within a dedicated style extraction module. The fused representation conditions an LLM-based speech decoder, enabling prosodic modulation that is temporally aligned with hand movements. We further design a gesture-speech alignment loss that explicitly models their temporal correspondence to ensure fine-grained synchrony between gestures and prosodic contours. Evaluations on the PATS dataset show that Gesture2Speech outperforms state-of-the-art baselines in both speech naturalness and gesture-speech synchrony. To the best of our knowledge, this is the first work to utilize hand gesture cues for prosody control in neural speech synthesis. Demo samples are available at https://research.sri-media-analysis.com/aaai26-beeu-gesture2speech/

ROJul 2, 2021
A Levy Flight based Narrow Passage Sampling Method for Probabilistic Roadmap Planners

Shubham Shukla, Lokesh Kumar, Titas Bera et al.

Sampling based probabilistic roadmap planners (PRM) have been successful in motion planning of robots with higher degrees of freedom, but may fail to capture the connectivity of the configuration space in scenarios with a critical narrow passage. In this paper, we show a novel technique based on Levy Flights to generate key samples in the narrow regions of configuration space, which, when combined with a PRM, improves the completeness of the planner. The technique substantially improves sample quality at the expense of a minimal additional computation, when compared with pure random walk based methods, however, still outperforms state of the art random bridge building method, in terms of number of collision calls, computational overhead and sample quality. The method is robust to the changes in the parameters related to the structure of the narrow passage, thus giving an additional generality. A number of 2D & 3D motion planning simulations are presented which shows the effectiveness of the method.