LGDec 6, 2024
A Temporally Correlated Latent Exploration for Reinforcement LearningSuMin Oh, WanSoo Kim, HyunJin Kim
Efficient exploration remains one of the longstanding problems of deep reinforcement learning. Instead of depending solely on extrinsic rewards from the environments, existing methods use intrinsic rewards to enhance exploration. However, we demonstrate that these methods are vulnerable to Noisy TV and stochasticity. To tackle this problem, we propose Temporally Correlated Latent Exploration (TeCLE), which is a novel intrinsic reward formulation that employs an action-conditioned latent space and temporal correlation. The action-conditioned latent space estimates the probability distribution of states, thereby avoiding the assignment of excessive intrinsic rewards to unpredictable states and effectively addressing both problems. Whereas previous works inject temporal correlation for action selection, the proposed method injects it for intrinsic reward computation. We find that the injected temporal correlation determines the exploratory behaviors of agents. Various experiments show that the environment where the agent performs well depends on the amount of temporal correlation. To the best of our knowledge, the proposed TeCLE is the first approach to consider the action conditioned latent space and temporal correlation for curiosity-driven exploration. We prove that the proposed TeCLE can be robust to the Noisy TV and stochasticity in benchmark environments, including Minigrid and Stochastic Atari.
RONov 11, 2021
An Online Multi-Index Approach to Human Ergonomics Assessment in the WorkplaceMarta Lorenzini, Wansoo Kim, Arash Ajoudani
Work-related musculoskeletal disorders (WMSDs) remain one of the major occupational safety and health problems in the European Union nowadays. Thus, continuous tracking of workers' exposure to the factors that may contribute to their development is paramount. This paper introduces an online approach to monitor kinematic and dynamic quantities on the workers, providing on the spot an estimate of the physical load required in their daily jobs. A set of ergonomic indexes is defined to account for multiple potential contributors to WMSDs, also giving importance to the subject-specific requirements of the workers. To evaluate the proposed framework, a thorough experimental analysis was conducted on twelve human subjects considering tasks that represent typical working activities in the manufacturing sector. For each task, the ergonomic indexes that better explain the underlying physical load were identified, following a statistical analysis, and supported by the outcome of a surface electromyography (sEMG) analysis. A comparison was also made with a well-recognised and standard tool to evaluate human ergonomics in the workplace, to highlight the benefits introduced by the proposed framework. Results demonstrate the high potential of the proposed framework in identifying the physical risk factors, and therefore to adopt preventive measures. Another equally important contribution of this study is the creation of a comprehensive database on human kinodynamic measurements, which hosts multiple sensory data of healthy subjects performing typical industrial tasks.
SDOct 27, 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised RepresentationsHyeong-Seok Choi, Juheon Lee, Wansoo Kim et al.
We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.