Kenji Suzuki

CV
h-index36
20papers
182citations
Novelty41%
AI Score45

20 Papers

IVSep 11, 2024Code
BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs

Haolin Wang, Yafei Ou, Prasoon Ambalathankandy et al.

Conventional radiography is the widely used imaging technology in diagnosing, monitoring, and prognosticating musculoskeletal (MSK) diseases because of its easy availability, versatility, and cost-effectiveness. In conventional radiographs, bone overlaps are prevalent, and can impede the accurate assessment of bone characteristics by radiologists or algorithms, posing significant challenges to conventional and computer-aided diagnoses. This work initiated the study of a challenging scenario - bone layer separation in conventional radiographs, in which separate overlapped bone regions enable the independent assessment of the bone characteristics of each bone layer and lay the groundwork for MSK disease diagnosis and its automation. This work proposed a Bone Layer Separation GAN (BLS-GAN) framework that can produce high-quality bone layer images with reasonable bone characteristics and texture. This framework introduced a reconstructor based on conventional radiography imaging principles, which achieved efficient reconstruction and mitigates the recurrent calculations and training instability issues caused by soft tissue in the overlapped regions. Additionally, pre-training with synthetic images was implemented to enhance the stability of both the training process and the results. The generated images passed the visual Turing test, and improved performance in downstream tasks. This work affirms the feasibility of extracting bone layer images from conventional radiographs, which holds promise for leveraging bone layer separation technology to facilitate more comprehensive analytical research in MSK diagnosis, monitoring, and prognosis. Code and dataset: https://github.com/pokeblow/BLS-GAN.git.

ROAug 3, 2022
Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics

Diego Paez-Granados, Yujie He, David Gonon et al.

Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation metrics for accounting efficiency, controller response and crowd interactions in natural crowds. We report the results of over 110 trials in different crowd types: sparse, flows, and mixed traffic, with low- (< 0.15 ppsm), mid- (< 0.65 ppsm), and high- (< 1 ppsm) pedestrian densities. We present comparative results between two low-level obstacle avoidance methods and a baseline of shared control. Results show a 10% drop in relative time to goal on the highest density tests, and no other efficiency metric decrease. Moreover, autonomous navigation showed to be comparable to shared-control navigation with a lower relative jerk and significantly higher fluency in commands indicating high compatibility with the crowd. We conclude that the reactive controller fulfils a necessary task of fast and continuous adaptation to crowd navigation, and it should be coupled with high-level planners for environmental and situational awareness.

CVDec 5, 2022
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Naoki Matsunaga, Masato Ishii, Akio Hayakawa et al.

Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a target image. Users then manipulate the map to instruct how the image will be edited. We utilize a pre-trained diffusion model to generate edited images aligned with the user's intention with pixel-wise guidance. The effective combination of proposed guidance and other techniques enables highly controllable editing with preserving the outside of the edited area, which results in meeting our requirements. The experimental results demonstrate that our proposal outperforms the GAN-based method for editing quality and speed.

46.3HCMar 27
User Involvement in Robotic Wheelchair Development: A Decade of Limited Progress

Mario Andres Chavarria, Santiago Price Torrendell, Aude Billard et al.

Robotic wheelchairs (RWs) offer significant potential to enhance autonomy and participation for people with mobility impairments, yet many systems have failed to achieve sustained real-world adoption. This narrative literature review examined the extent and quality of end-user involvement in RW design, development, and evaluation over the past decade (2015--2025), assessed against core principles shared by major user-involvement approaches (e.g., user-/human-centered design, participatory/co-design, and inclusive design). The findings indicate that user involvement remains limited and is predominantly concentrated in late-stage evaluation rather than in early requirements definition or iterative co-design. Of the 399 records screened, only 23 studies (about 6%) met the inclusion criteria of verifiable end-user involvement, and many relied on small samples, often around ten participants, with limited justification for sample size selection, proxy users, laboratory-based validation, and non-standardized feedback methods. Research teams were largely engineering-dominated (about 89%) and geographically concentrated in high-income countries. Despite strong evidence that sustained user engagement improves usability and adoption in assistive technology, its systematic implementation in RW research remains rare. Advancing the field requires embedding participatory methodologies throughout the design lifecycle and addressing systemic barriers that constrain meaningful user involvement.

47.5HCMar 18
ViSTAR: Virtual Skill Training with Augmented Reality with 3D Avatars and LLM coaching agent

Chunggi Lee, Hayato Saiki, Tica Lin et al.

We present ViSTAR, a Virtual Skill Training system in AR that supports self-guided basketball skill practice, with feedback on balance, posture, and timing. From a formative study with basketball players and coaches, the system addresses three challenges: understanding skills, identifying errors, and correcting mistakes. ViSTAR follows the Behavioral Skills Training (BST) framework-instruction, modeling, rehearsal, and feedback. It provides feedback through visual overlays, rhythm and timing cues, and an AI-powered coaching agent using 3D motion reconstruction. We generate verbal feedback by analyzing spatio-temporal joint data and mapping features to natural-language coaching cues via a Large Language Model (LLM). A key novelty is this feedback generation: motion features become concise coaching insights. In two studies (N=16), participants generally preferred our AI-generated feedback to coach feedback and reported that ViSTAR helped them notice posture and balance issues and refine movements beyond self-observation.

CVFeb 5
Multi-AD: Cross-Domain Unsupervised Anomaly Detection for Medical and Industrial Applications

Wahyu Rahmaniar, Kenji Suzuki

Traditional deep learning models often lack annotated data, especially in cross-domain applications such as anomaly detection, which is critical for early disease diagnosis in medicine and defect detection in industry. To address this challenge, we propose Multi-AD, a convolutional neural network (CNN) model for robust unsupervised anomaly detection across medical and industrial images. Our approach employs the squeeze-and-excitation (SE) block to enhance feature extraction via channel-wise attention, enabling the model to focus on the most relevant features and detect subtle anomalies. Knowledge distillation (KD) transfers informative features from the teacher to the student model, enabling effective learning of the differences between normal and anomalous data. Then, the discriminator network further enhances the model's capacity to distinguish between normal and anomalous data. At the inference stage, by integrating multi-scale features, the student model can detect anomalies of varying sizes. The teacher-student (T-S) architecture ensures consistent representation of high-dimensional features while adapting them to enhance anomaly detection. Multi-AD was evaluated on several medical datasets, including brain MRI, liver CT, and retina OCT, as well as industrial datasets, such as MVTec AD, demonstrating strong generalization across multiple domains. Experimental results demonstrated that our approach consistently outperformed state-of-the-art models, achieving the best average AUROC for both image-level (81.4% for medical and 99.6% for industrial) and pixel-level (97.0% for medical and 98.4% for industrial) tasks, making it effective for real-world applications.

CVDec 9, 2024
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition

Michael Yeung, Toya Teramoto, Songtao Wu et al.

The use of large-scale, web-scraped datasets to train face recognition models has raised significant privacy and bias concerns. Synthetic methods mitigate these concerns and provide scalable and controllable face generation to enable fair and accurate face recognition. However, existing synthetic datasets display limited intraclass and interclass diversity and do not match the face recognition performance obtained using real datasets. Here, we propose VariFace, a two-stage diffusion-based pipeline to create fair and diverse synthetic face datasets to train face recognition models. Specifically, we introduce three methods: Face Recognition Consistency to refine demographic labels, Face Vendi Score Guidance to improve interclass diversity, and Divergence Score Conditioning to balance the identity preservation-intraclass diversity trade-off. When constrained to the same dataset size, VariFace considerably outperforms previous synthetic datasets (0.9200 $\rightarrow$ 0.9405) and achieves comparable performance to face recognition models trained with real data (Real Gap = -0.0065). In an unconstrained setting, VariFace not only consistently achieves better performance compared to previous synthetic methods across dataset sizes but also, for the first time, outperforms the real dataset (CASIA-WebFace) across six evaluation datasets. This sets a new state-of-the-art performance with an average face verification accuracy of 0.9567 (Real Gap = +0.0097) across LFW, CFP-FP, CPLFW, AgeDB, and CALFW datasets and 0.9366 (Real Gap = +0.0380) on the RFW dataset.

CVMay 21, 2025
FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion

Kazuaki Mishima, Antoni Bigata Casademunt, Stavros Petridis et al.

Human facial images encode a rich spectrum of information, encompassing both stable identity-related traits and mutable attributes such as pose, expression, and emotion. While recent advances in image generation have enabled high-quality identity-conditional face synthesis, precise control over non-identity attributes remains challenging, and disentangling identity from these mutable factors is particularly difficult. To address these limitations, we propose a novel identity-conditional diffusion model that introduces two lightweight control modules designed to independently manipulate facial pose, expression, and emotion without compromising identity preservation. These modules are embedded within the cross-attention layers of the base diffusion model, enabling precise attribute control with minimal parameter overhead. Furthermore, our tailored training strategy, which leverages cross-attention between the identity feature and each non-identity control feature, encourages identity features to remain orthogonal to control signals, enhancing controllability and diversity. Quantitative and qualitative evaluations, along with perceptual user studies, demonstrate that our method surpasses existing approaches in terms of control accuracy over pose, expression, and emotion, while also improving generative diversity under identity-only conditioning.

IVFeb 4, 2025
Layer Separation: Adjustable Joint Space Width Images Synthesis in Conventional Radiography

Haolin Wang, Yafei Ou, Prasoon Ambalathankandy et al.

Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis face significant challenges in data quality, including data imbalance, limited variety, and annotation difficulties. This work introduced a challenging image synthesis scenario and proposed Layer Separation Networks (LSN) to accurately separate the soft tissue layer, the upper bone layer, and the lower bone layer in conventional radiographs of finger joints. Using these layers, the adjustable JSW images can be synthesized to address data quality challenges and achieve ground truth (GT) generation. Experimental results demonstrated that LSN-based synthetic images closely resemble real radiographs, and significantly enhanced the performance in downstream tasks. The code and dataset will be available.

CVJun 17, 2024
Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification

Zhipeng Deng, Yuqiao Yang, Kenji Suzuki

Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data. Current FL research focuses on communication efficiency, privacy protection, and personalization and assumes that the data of FL have already been ideally collected. In medical scenarios, however, data annotation demands both expertise and intensive labor, which is a critical problem in FL. Active learning (AL), has shown promising performance in reducing the number of data annotations in medical image analysis. We propose a federated AL (FedAL) framework in which AL is executed periodically and interactively under FL. We exploit a local model in each hospital and a global model acquired from FL to construct an ensemble. We use ensemble-entropy-based AL as an efficient data-annotation strategy in FL. Therefore, our FedAL framework can decrease the amount of annotated data and preserve patient privacy while maintaining the performance of FL. To our knowledge, this is the first FedAL framework applied to medical images. We validated our framework on real-world dermoscopic datasets. Using only 50% of samples, our framework was able to achieve state-of-the-art performance on a skin-lesion classification task. Our framework performed better than several state-of-the-art AL methods under FL and achieved comparable performance to full-data FL.

LGJun 15, 2024
MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Lin Fan, Yafei Ou, Cenyang Zheng et al.

Multi-modal learning has shown exceptional performance in various tasks, especially in medical applications, where it integrates diverse medical information for comprehensive diagnostic evidence. However, there still are several challenges in multi-modal learning, 1. Heterogeneity between modalities, 2. uncertainty in missing modalities, 3. influence of intrinsic noise, and 4. interpretability for fusion result. This paper introduces the Modal-Domain Attention (MDA) model to address the above challenges. MDA constructs linear relationships between modalities through continuous attention, due to its ability to adaptively allocate dynamic attention to different modalities, MDA can reduce attention to low-correlation data, missing modalities, or modalities with inherent noise, thereby maintaining SOTA performance across various tasks on multiple public datasets. Furthermore, our observations on the contribution of different modalities indicate that MDA aligns with established clinical diagnostic imaging gold standards and holds promise as a reference for pathologies where these standards are not yet clearly defined. The code and dataset will be available.

ROJan 10, 2022
Personal Mobility With Synchronous Trunk-Knee Passive Exoskeleton: Optimizing Human-Robot Energy Transfer

Diego Paez-Granados, Hideki Kadone, Modar Hassan et al.

We present a personal mobility device for lower-body impaired users through a light-weighted exoskeleton on wheels. On its core, a novel passive exoskeleton provides postural transition leveraging natural body postures with support to the trunk on sit-to-stand and stand-to-sit (STS) transitions by a single gas spring as an energy storage unit. We propose a direction-dependent coupling of knees and hip joints through a double-pulley wire system, transferring energy from the torso motion towards balancing the moment load at the knee joint actuator. Herewith, the exoskeleton maximizes energy transfer and the naturalness of the user's movement. We introduce an embodied user interface for hands-free navigation through a torso pressure sensing with minimal trunk rotations, resulting on average $19^{\circ} \pm 13^{\circ}$ on six unimpaired users. We evaluated the design for STS assistance on 11 unimpaired users observing motions and muscle activity during the transitions. Results comparing assisted and unassisted STS transitions validated a significant reduction (up to $68\%$ $p<0.01$) at the involved muscle groups. Moreover, we showed it feasible through natural torso leaning movements of $+12^{\circ}\pm 6.5^{\circ}$ and $- 13.7^{\circ} \pm 6.1^{\circ}$ for standing and sitting, respectively. Passive postural transition assistance warrants further work on increasing its applicability and broadening the user population.

ROJul 28, 2021
Virtual Landmark-Based Control of Docking Support for Assistive Mobility Devices

Yang Chen, Diego Paez-Granados, Bruno Leme et al.

This work proposes an autonomous docking control for nonholonomic constrained mobile robots and applies it to an intelligent mobility device or wheelchair for assisting the user in approaching resting furniture such as a chair or a bed. We defined a virtual landmark inferred from the target docking destination. Then, we solve the problem of keeping the targeted volume inside the field of view (FOV) of a tracking camera and docking to the virtual landmark through a novel definition that enables to control for the desired end-pose. In this article, we proposed a nonlinear feedback controller to perform the docking with the depth camera's FOV as a constraint. Then, a numerical method is proposed to find the feasible space of initial states where convergence could be guaranteed. Finally, the entire system was embedded for real-time operation on a standing wheelchair with the virtual landmark estimation by 3D object tracking with an RGB-D camera and we validated the effectiveness in simulation and experimental evaluations. The results show the guaranteed convergence for the feasible space depending on the virtual landmark location. In the implementation, the robot converges to the virtual landmark while respecting the FOV constraints.

CVJul 9, 2021
A Multi-task Mean Teacher for Semi-supervised Facial Affective Behavior Analysis

Lingfeng Wang, Shisen Wang, Jin Qi et al.

Affective Behavior Analysis is an important part in human-computer interaction. Existing multi-task affective behavior recognition methods suffer from the problem of incomplete labeled datasets. To tackle this problem, this paper presents a semi-supervised model with a mean teacher framework to leverage additional unlabeled data. To be specific, a multi-task model is proposed to learn three different kinds of facial affective representations simultaneously. After that, the proposed model is assigned to be student and teacher networks. When training with unlabeled data, the teacher network is employed to predict pseudo labels for student network training, which allows it to learn from unlabeled data. Experimental results showed that our proposed method achieved much better performance than baseline model and ranked 4th in both competition track 1 and track 2, and 6th in track 3, which verifies that the proposed network can effectively learn from incomplete datasets.

LGMar 22, 2021
Data Cleansing for Deep Neural Networks with Storage-efficient Approximation of Influence Functions

Kenji Suzuki, Yoshiyuki Kobayashi, Takuya Narihira

Identifying the influence of training data for data cleansing can improve the accuracy of deep learning. An approach with stochastic gradient descent (SGD) called SGD-influence to calculate the influence scores was proposed, but, the calculation costs are expensive. It is necessary to temporally store the parameters of the model during training phase for inference phase to calculate influence sores. In close connection with the previous method, we propose a method to reduce cache files to store the parameters in training phase for calculating inference score. We only adopt the final parameters in last epoch for influence functions calculation. In our experiments on classification, the cache size of training using MNIST dataset with our approach is 1.236 MB. On the other hand, the previous method used cache size of 1.932 GB in last epoch. It means that cache size has been reduced to 1/1,563. We also observed the accuracy improvement by data cleansing with removal of negatively influential data using our approach as well as the previous method. Moreover, our simple and general proposed method to calculate influence scores is available on our auto ML tool without programing, Neural Network Console. The source code is also available.

ROMar 9, 2021
Passive Flow Control for Series Inflatable Actuators: Application on a Wearable Soft-Robot for Posture Assistance

Diego Paez-Granados, Takehiro Yamamoto, Hideki Kadone et al.

This paper presents a passive control method for multiple degrees of freedom in a soft pneumatic robot through the combination of flow resistor tubes with series inflatable actuators. We designed and developed these 3D printed resistors based on the pressure drop principle of multiple capillary orifices, which allows a passive control of its sequential activation from a single source of pressure. Our design fits in standard tube connectors, making it easy to adopt it on any other type of actuator with pneumatic inlets. We present its characterization of pressure drop and evaluation of the activation sequence for series and parallel circuits of actuators. Moreover, we present an application for the assistance of postural transition from lying to sitting. We embedded it in a wearable garment robot-suit designed for infants with cerebral palsy. Then, we performed the test with a dummy baby for emulating the upper-body motion control. The results show a sequential motion control of the sitting and lying transitions validating the proposed system for flow control and its application on the robot-suit.

LGFeb 12, 2021
Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives

Takuya Narihira, Javier Alonsogarcia, Fabien Cardinaux et al.

While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspective, with emphasis on usability and compatibility as its core design principles. We elaborate on each of our design principles and its merits, and validate our attempts via experiments.

HCAug 20, 2020
Facial movement synergies and Action Unit detection from distal wearable Electromyography and Computer Vision

Monica Perusquia-Hernandez, Felix Dollack, Chun Kwang Tan et al.

Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes on volume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The main advantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMG measurements. However, measuring EMG distally entails that the exact source of the facial movement is unknown. We propose a novel method to estimate specific Facial Action Units (AUs) from distal facial EMG and Computer Vision (CV). This method is based on Independent Component Analysis (ICA), Non-Negative Matrix Factorization (NNMF), and sorting of the resulting components to determine which is the most likely to correspond to each CV-labeled action unit (AU). Performance on the detection of AU06 (Orbicularis Oculi) and AU12 (Zygomaticus Major) was estimated by calculating the agreement with Human Coders. The results of our proposed algorithm showed an accuracy of 81% and a Cohen's Kappa of 0.49 for AU6; and accuracy of 82% and a Cohen's Kappa of 0.53 for AU12. This demonstrates the potential of distal EMG to detect individual facial movements. Using this multimodal method, several AU synergies were identified. We quantified the co-occurrence and timing of AU6 and AU12 in posed and spontaneous smiles using the human-coded labels, and for comparison, using the continuous CV-labels. The co-occurrence analysis was also performed on the EMG-based labels to uncover the relationship between muscle synergies and the kinematics of visible facial movement.

ROAug 3, 2020
Control Interface for Hands-free Navigation of Standing Mobility Vehicles based on Upper-Body Natural Movements

Yang Chen, Diego Paez-Granados, Hideki Kadone et al.

In this paper, we propose and evaluate a novel human-machine interface (HMI) for controlling a standing mobility vehicle or person carrier robot, aiming for a hands-free control through upper-body natural postures derived from gaze tracking while walking. We target users with lower-body impairment with remaining upper-body motion capabilities. The developed HMI bases on a sensing array for capturing body postures; an intent recognition algorithm for continuous mapping of body motions to robot control space; and a personalizing system for multiple body sizes and shapes. We performed two user studies: first, an analysis of the required body muscles involved in navigating with the proposed control; and second, an assessment of the HMI compared with a standard joystick through quantitative and qualitative metrics in a narrow circuit task. We concluded that the main user control contribution comes from Rectus Abdominis and Erector Spinae muscle groups at different levels. Finally, the comparative study showed that a joystick still outperforms the proposed HMI in usability perceptions and controllability metrics, however, the smoothness of user control was similar in jerk and fluency. Moreover, users' perceptions showed that hands-free control made it more anthropomorphic, animated, and even safer.

RONov 15, 2017
A Systematic Literature Review of Experiments in Socially Assistive Robotics using Humanoid Robots

Floris Erich, Masakazu Hirokawa, Kenji Suzuki

We perform a Systematic Literature Review to discover how Humanoid robots are being applied in Socially Assistive Robotics experiments. Our search returned 24 papers, from which 16 were included for closer analysis. To do this analysis we used a conceptual framework inspired by Behavior-based Robotics. We were interested in finding out which robot was used (most use the robot NAO), what the goals of the application were (teaching, assisting, playing, instructing), how the robot was controlled (manually in most of the experiments), what kind of behaviors the robot exhibited (reacting to touch, pointing at body parts, singing a song, dancing, among others), what kind of actuators the robot used (always motors, sometimes speakers, hardly ever any other type of actuator) and what kind of sensors the robot used (in many studies the robot did not use any sensors at all, in others the robot frequently used camera and/or microphone). The results of this study can be used for designing software frameworks targeting Humanoid Socially Assistive Robotics, especially in the context of Software Product Line Engineering projects.