Jinwook Kim

CV
h-index39
9papers
58citations
Novelty39%
AI Score50

9 Papers

HCMar 19
Align-to-Scale: Mode Switching Technique for Unimanual 3D Object Manipulation with Gaze-Hand-Object Alignment in Extended Reality

Min-yung Kim, Jinwook Kim, Ken Pfeuffer et al.

As extended reality (XR) technologies rapidly become as ubiquitous as today's mobile devices, supporting one-handed interaction becomes essential for XR. However, the prevalent Gaze + Pinch interaction model partially supports unimanual interaction, where users select, move, and rotate objects with one hand, but scaling typically requires both hands. In this work, we leverage the spatial alignment between gaze and hand as a mode switch to enable single-handed pinch-to-scale. We design and evaluate several techniques geared for one-handed scaling and assess their usability in a compound translate-scale task. Our findings show that all proposed methods effectively enable one-handed scaling, but each method offers distinct advantages and trade-offs. To this end, we derive design guidelines to support futuristic 3D interfaces with unimanual interaction. Our work helps make eye-hand 3D interaction in XR more mobile, flexible, and accessible.

CYMar 31
A Regulatory Compliance Protocol for Asset Interoperability Between Traditional and Decentralized Finance in Tokenized Capital Markets

Jinwook Kim, Jonghun Hong

There have been various attempts at token standards on numerous blockchain platforms today to fundamentally change the way assets are traded in the traditional capital markets, but there is a lack of research and resolution on regulatory issues that become the common foundation for interoperability and reusable standards. Our proposal, Regulatory Compliance Protocol (RCP), is based on the regulations and reports of 15 global financial institutions and standardizes recommendations and guidelines involving the overall asset tokenization of TradFi and DeFi into five regulatory groups: Traceability, Confidentiality, Enforceability, Finality and Tokenizability, compiling them into 31 items and presenting a benchmark for technology and standards as an underlying protocol. To review the legality and effectiveness of RCP, it was validated based on three tokenization and trading scenarios, and through the RCP-based NEW-EIP, it showed superiority over other ERC protocols related to asset tokenization.

ROMay 5
RLDX-1 Technical Report

Dongyoung Kim, Huiwon Jang, Myungkyu Koo et al.

While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $π_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $π_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.

CRApr 4
Safety and Liveness of Cross-Domain State Preservation under Byzantine Faults: A Mechanized Proof in Isabelle/HOL

Jinwook Kim

Formally guaranteeing the safety and liveness of regulatory state transitions in cross-domain state synchronization systems is a problem of growing importance as tokenized assets are increasingly operated across heterogeneous blockchain networks and off-chain ledgers. This paper presents a mechanized proof of 2,348 lines in Isabelle/HOL establishing two complementary properties. First, cross-domain state preservation (safety): a regulatory state transition performed on one domain is faithfully reflected across all connected domains with structural preservation. This guarantee encompasses bidirectional roundtrip preservation, consistency across an arbitrary finite set of domains, and per-asset isolation. Second, liveness under Byzantine faults: in the presence of up to f < n/3 Byzantine nodes, we prove deterministic resolution of conflicting regulatory actions, deadlock freedom, and starvation freedom. In the combination of these two properties, the liveness proof discharges the honest-node assumption of the safety proof under Byzantine faults, promoting conditional safety to an unconditional guarantee. The seven generic locales derived in this process are domain-independent and reusable for arbitrary domains via Isabelle/HOL's interpretation mechanism. The application context is a regulatory state transition model based on the RCP framework (arXiv:2603.29278), which systematizes 31 requirements from 15 global financial regulatory authorities. All proof artifacts build in Isabelle/HOL without sorry or oops, have been submitted to the Archive of Formal Proofs (under review), and are publicly available on GitHub.

CVAug 26, 2025
SoccerNet 2025 Challenges Results

Silvio Giancola, Anthony Cioppa, Marc Gutiérrez-Pérez et al.

The SoccerNet 2025 Challenges mark the fifth annual edition of the SoccerNet open benchmarking effort, dedicated to advancing computer vision research in football video understanding. This year's challenges span four vision-based tasks: (1) Team Ball Action Spotting, focused on detecting ball-related actions in football broadcasts and assigning actions to teams; (2) Monocular Depth Estimation, targeting the recovery of scene geometry from single-camera broadcast clips through relative depth estimation for each pixel; (3) Multi-View Foul Recognition, requiring the analysis of multiple synchronized camera views to classify fouls and their severity; and (4) Game State Reconstruction, aimed at localizing and identifying all players from a broadcast video to reconstruct the game state on a 2D top-view of the field. Across all tasks, participants were provided with large-scale annotated datasets, unified evaluation protocols, and strong baselines as starting points. This report presents the results of each challenge, highlights the top-performing solutions, and provides insights into the progress made by the community. The SoccerNet Challenges continue to serve as a driving force for reproducible, open research at the intersection of computer vision, artificial intelligence, and sports. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.

SPFeb 28, 2022
Gait Events Prediction using Hybrid CNN-RNN-based Deep Learning models through a Single Waist-worn Wearable Sensor

Muhammad Zeeshan Arshad, Ankhzaya Jamsrandorj, Jinwook Kim et al.

Elderly gait is a source of rich information about their physical and mental health condition. As an alternative to the multiple sensors on the lower body parts, a single sensor on the pelvis has a positional advantage and an abundance of information acquirable. This study aimed to explore a way of improving the accuracy of gait event detection in the elderly using a single sensor on the waist and deep learning models. Data was gathered from elderly subjects equipped with three IMU sensors while they walked. The input was taken only from the waist sensor was used to train 16 deep-learning models including CNN, RNN, and CNN-RNN hybrid with or without the Bidirectional and Attention mechanism. The groundtruth was extracted from foot IMU sensors. Fairly high accuracy of 99.73% and 93.89% was achieved by the CNN-BiGRU-Att model at the tolerance window of $\pm$6TS ($\pm$6ms) and $\pm$1TS ($\pm$1ms) respectively. Advancing from the previous studies exploring gait event detection, the model showed a great improvement in terms of its prediction error having an MAE of 6.239ms and 5.24ms for HS and TO events respectively at the tolerance window of $\pm$1TS. The results showed that the use of CNN-RNN hybrid models with Attention and Bidirectional mechanisms is promising for accurate gait event detection using a single waist sensor. The study can contribute to reducing the burden of gait detection and increase its applicability in future wearable devices that can be used for remote health monitoring (RHM) or diagnosis based thereon.

HCJan 12, 2022
Effects of Virtual Room Size and Objects on Relative Translation Gain Thresholds in Redirected Walking

Dooyoung Kim, Jinwook Kim, Jae-eun Shin et al.

This paper investigates how the size of virtual space and objects within it affect the threshold range of relative translation gains, a Redirected Walking (RDW) technique that scales the user's movement in virtual space in different ratios for the width and depth. While previous studies assert that a virtual room's size affects relative translation gain thresholds on account of the virtual horizon's location, additional research is needed to explore this assumption through a structured approach to visual perception in Virtual Reality (VR). We estimate the relative translation gain thresholds in six spatial conditions configured by three room sizes and the presence of virtual objects (3 X 2), which were set according to differing Angles of Declination (AoDs) between eye-gaze and the forward-gaze. Results show that both size and virtual objects significantly affect the threshold range, it being greater in the large-sized condition and furnished condition. This indicates that the effect of relative translation gains can be further increased by constructing a perceived virtual movable space that is even larger than the adjusted virtual movable space and placing objects in it. Our study can be applied to adjust virtual spaces in synchronizing heterogeneous spaces without coordinate distortion where real and virtual objects can be leveraged to create realistic mutual spaces.

CVOct 15, 2021
Gait-based Frailty Assessment using Image Representation of IMU Signals and Deep CNN

Muhammad Zeeshan Arshad, Dawoon Jung, Mina Park et al.

Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based models can be utilized for the classification of gait type. Two deep learning models (a) SS-CNN, based on single stride input images, and (b) MS-CNN, based on 3 consecutive strides were proposed. It was shown that MS-CNN performs best with an accuracy of 85.1\%, while SS-CNN achieved an accuracy of 77.3\%. This is because MS-CNN can observe more features corresponding to stride-to-stride variations which is one of the key symptoms of frailty. Gait signals were encoded as images using STFT, CWT, and GAF. While the MS-CNN model using GAF images achieved the best overall accuracy and precision, CWT has a slightly better recall. This study demonstrates how image encoded gait data can be used to exploit the full potential of deep learning CNN models for the assessment of frailty.

CVOct 15, 2021
Gait-based Human Identification through Minimum Gait-phases and Sensors

Muhammad Zeeshan Arshad, Dawoon Jung, Mina Park et al.

Human identification is one of the most common and critical tasks for condition monitoring, human-machine interaction, and providing assistive services in smart environments. Recently, human gait has gained new attention as a biometric for identification to achieve contactless identification from a distance robust to physical appearances. However, an important aspect of gait identification through wearables and image-based systems alike is accurate identification when limited information is available, for example, when only a fraction of the whole gait cycle or only a part of the subject body is visible. In this paper, we present a gait identification technique based on temporal and descriptive statistic parameters of different gait phases as the features and we investigate the performance of using only single gait phases for the identification task using a minimum number of sensors. It was shown that it is possible to achieve high accuracy of over 95.5 percent by monitoring a single phase of the whole gait cycle through only a single sensor. It was also shown that the proposed methodology could be used to achieve 100 percent identification accuracy when the whole gait cycle was monitored through pelvis and foot sensors combined. The ANN was found to be more robust to fewer data features compared to SVM and was concluded as the best machine algorithm for the purpose.