Yutaro Hirao

h-index7

3papers

4citations

Novelty53%

AI Score47

Ranked #34,298 of 194,257 authors (top 18%)#12,215 in CV (top 21%)

3 Papers

6.0CVMay 23

EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion

Joshua Siy, Huakun Liu, Yutaro Hirao et al.

Human motion diffusion models can synthesize action sequences from text, but controlling motion intensity remains challenging. Existing approaches rely on effort-related adverbs, which are ambiguous and fail to capture quantitative aspects such as pacing, often resulting in flat and monotonous dynamics. We propose an intensity-control framework based on Effort Metric Attention (EMA), a cross-attention module that conditions diffusion on numerical effort signals. Inspired by Laban Movement Analysis (LMA), the framework focuses on the Time and Weight effort factors. We approximate these factors using two kinematic metrics: peak joint positional change for pacing and collective joint positional change for motion amount. EMA enables fine-grained, region-wise control without costly post-hoc optimization. We introduce two evaluation tasks, metric-to-motion consistency and body-part-level effort modulation, to assess numerical fidelity and localized control. Experiments and a user study show near-monotonic alignment between specified effort levels, generated motion dynamics, and established LMA descriptors. These results indicate effective and interpretable control of effort dynamics in practice.

5.1IVMay 24, 2025

Mind Your Vision: Multimodal Estimation of Refractive Disorders Using Electrooculography and Eye Tracking

Xin Wei, Huakun Liu, Yutaro Hirao et al.

Refractive errors are among the most common visual impairments globally, yet their diagnosis often relies on active user participation and clinical oversight. This study explores a passive method for estimating refractive power using two eye movement recording techniques: electrooculography (EOG) and video-based eye tracking. Using a publicly available dataset recorded under varying diopter conditions, we trained Long Short-Term Memory (LSTM) models to classify refractive power from unimodal (EOG or eye tracking) and multimodal configuration. We assess performance in both subject-dependent and subject-independent settings to evaluate model personalization and generalizability across individuals. Results show that the multimodal model consistently outperforms unimodal models, achieving the highest average accuracy in both settings: 96.207\% in the subject-dependent scenario and 8.882\% in the subject-independent scenario. However, generalization remains limited, with classification accuracy only marginally above chance in the subject-independent evaluations. Statistical comparisons in the subject-dependent setting confirmed that the multimodal model significantly outperformed the EOG and eye-tracking models. However, no statistically significant differences were found in the subject-independent setting. Our findings demonstrate both the potential and current limitations of eye movement data-based refractive error estimation, contributing to the development of continuous, non-invasive screening methods using EOG signals and eye-tracking data.

4.3GRMay 14, 2025Code

UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

Huakun Liu, Hiroki Ota, Xin Wei et al.

Sparse wearable inertial measurement units (IMUs) have gained popularity for estimating 3D human motion. However, challenges such as pose ambiguity, data drift, and limited adaptability to diverse bodies persist. To address these issues, we propose UMotion, an uncertainty-driven, online fusing-all state estimation framework for 3D human shape and pose estimation, supported by six integrated, body-worn ultra-wideband (UWB) distance sensors with IMUs. UWB sensors measure inter-node distances to infer spatial relationships, aiding in resolving pose ambiguities and body shape variations when combined with anthropometric data. Unfortunately, IMUs are prone to drift, and UWB sensors are affected by body occlusions. Consequently, we develop a tightly coupled Unscented Kalman Filter (UKF) framework that fuses uncertainties from sensor data and estimated human motion based on individual body shape. The UKF iteratively refines IMU and UWB measurements by aligning them with uncertain human motion constraints in real-time, producing optimal estimates for each. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of UMotion in stabilizing sensor data and the improvement over state of the art in pose accuracy.