Yasuhisa Hasegawa

CV
h-index10
9papers
205citations
Novelty44%
AI Score47

9 Papers

ROAug 29, 2023
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks

Haokun Liu, Yaonan Zhu, Kenji Kato et al. · utoronto

This paper presents a novel approach to enhance autonomous robotic manipulation using the Large Language Model (LLM) for logical inference, converting high-level language commands into sequences of executable motion functions. The proposed system combines the advantage of LLM with YOLO-based environmental perception to enable robots to autonomously make reasonable decisions and task planning based on the given commands. Additionally, to address the potential inaccuracies or illogical actions arising from LLM, a combination of teleoperation and Dynamic Movement Primitives (DMP) is employed for action correction. This integration aims to improve the practicality and generalizability of the LLM-based human-robot collaboration system.

CVJan 16Code
Bio-inspired fine-tuning for selective transfer learning in image classification

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

Deep learning has significantly advanced image analysis across diverse domains but often depends on large, annotated datasets for success. Transfer learning addresses this challenge by utilizing pre-trained models to tackle new tasks with limited labeled data. However, discrepancies between source and target domains can hinder effective transfer learning. We introduce BioTune, a novel adaptive fine-tuning technique utilizing evolutionary optimization. BioTune enhances transfer learning by optimally choosing which layers to freeze and adjusting learning rates for unfrozen layers. Through extensive evaluation on nine image classification datasets, spanning natural and specialized domains such as medical imaging, BioTune demonstrates superior accuracy and efficiency over state-of-the-art fine-tuning methods, including AutoRGN and LoRA, highlighting its adaptability to various data characteristics and distribution changes. Additionally, BioTune consistently achieves top performance across four different CNN architectures, underscoring its flexibility. Ablation studies provide valuable insights into the impact of BioTune's key components on overall performance. The source code is available at https://github.com/davilac/BioTune.

ROSep 16, 2024
Embedded Image-to-Image Translation for Efficient Sim-to-Real Transfer in Learning-based Robot-Assisted Soft Manipulation

Jacinto Colan, Keisuke Sugita, Ana Davila et al.

Recent advances in robotic learning in simulation have shown impressive results in accelerating learning complex manipulation skills. However, the sim-to-real gap, caused by discrepancies between simulation and reality, poses significant challenges for the effective deployment of autonomous surgical systems. We propose a novel approach utilizing image translation models to mitigate domain mismatches and facilitate efficient robot skill learning in a simulated environment. Our method involves the use of contrastive unpaired Image-to-image translation, allowing for the acquisition of embedded representations from these transformed images. Subsequently, these embeddings are used to improve the efficiency of training surgical manipulation models. We conducted experiments to evaluate the performance of our approach, demonstrating that it significantly enhances task success rates and reduces the steps required for task completion compared to traditional methods. The results indicate that our proposed system effectively bridges the sim-to-real gap, providing a robust framework for advancing the autonomy of surgical robots in minimally invasive procedures.

CLJul 16, 2025
Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating human language, contributing to more natural interactions with complex systems. However, they face challenges such as ambiguity in user requests processed by LLMs. To address these challenges, this paper introduces and evaluates a multi-agent debate framework designed to enhance detection and resolution capabilities beyond single models. The framework consists of three LLM architectures (Llama3-8B, Gemma2-9B, and Mistral-7B variants) and a dataset with diverse ambiguities. The debate framework markedly enhanced the performance of Llama3-8B and Mistral-7B variants over their individual baselines, with Mistral-7B-led debates achieving a notable 76.7% success rate and proving particularly effective for complex ambiguities and efficient consensus. While acknowledging varying model responses to collaborative strategies, these findings underscore the debate framework's value as a targeted method for augmenting LLM capabilities. This work offers important insights for developing more robust and adaptive language understanding systems by showing how structured debates can lead to improved clarity in interactive systems.

CVOct 17, 2025
Adaptive transfer learning for surgical tool presence detection in laparoscopic videos through gradual freezing fine-tuning

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

Minimally invasive surgery can benefit significantly from automated surgical tool detection, enabling advanced analysis and assistance. However, the limited availability of annotated data in surgical settings poses a challenge for training robust deep learning models. This paper introduces a novel staged adaptive fine-tuning approach consisting of two steps: a linear probing stage to condition additional classification layers on a pre-trained CNN-based architecture and a gradual freezing stage to dynamically reduce the fine-tunable layers, aiming to regulate adaptation to the surgical domain. This strategy reduces network complexity and improves efficiency, requiring only a single training loop and eliminating the need for multiple iterations. We validated our method on the Cholec80 dataset, employing CNN architectures (ResNet-50 and DenseNet-121) pre-trained on ImageNet for detecting surgical tools in cholecystectomy endoscopic videos. Our results demonstrate that our method improves detection performance compared to existing approaches and established fine-tuning techniques, achieving a mean average precision (mAP) of 96.4%. To assess its broader applicability, the generalizability of the fine-tuning strategy was further confirmed on the CATARACTS dataset, a distinct domain of minimally invasive ophthalmic surgery. These findings suggest that gradual freezing fine-tuning is a promising technique for improving tool presence detection in diverse surgical procedures and may have broader applications in general image classification tasks.

CVAug 21, 2025
Transfer learning optimization based on evolutionary selective fine tuning

Jacinto Colan, Ana Davila, Yasuhisa Hasegawa

Deep learning has shown substantial progress in image analysis. However, the computational demands of large, fully trained models remain a consideration. Transfer learning offers a strategy for adapting pre-trained models to new tasks. Traditional fine-tuning often involves updating all model parameters, which can potentially lead to overfitting and higher computational costs. This paper introduces BioTune, an evolutionary adaptive fine-tuning technique that selectively fine-tunes layers to enhance transfer learning efficiency. BioTune employs an evolutionary algorithm to identify a focused set of layers for fine-tuning, aiming to optimize model performance on a given target task. Evaluation across nine image classification datasets from various domains indicates that BioTune achieves competitive or improved accuracy and efficiency compared to existing fine-tuning methods such as AutoRGN and LoRA. By concentrating the fine-tuning process on a subset of relevant layers, BioTune reduces the number of trainable parameters, potentially leading to decreased computational cost and facilitating more efficient transfer learning across diverse data characteristics and distributions.

ROJun 20, 2024
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Haokun Liu, Yaonan Zhu, Kenji Kato et al.

Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.

CVJun 14, 2024
Comparison of fine-tuning strategies for transfer learning in medical image classification

Ana Davila, Jacinto Colan, Yasuhisa Hasegawa

In the context of medical imaging and machine learning, one of the most pressing challenges is the effective adaptation of pre-trained models to specialized medical contexts. Despite the availability of advanced pre-trained models, their direct application to the highly specialized and diverse field of medical imaging often falls short due to the unique characteristics of medical data. This study provides a comprehensive analysis on the performance of various fine-tuning methods applied to pre-trained models across a spectrum of medical imaging domains, including X-ray, MRI, Histology, Dermoscopy, and Endoscopic surgery. We evaluated eight fine-tuning strategies, including standard techniques such as fine-tuning all layers or fine-tuning only the classifier layers, alongside methods such as gradually unfreezing layers, regularization based fine-tuning and adaptive learning rates. We selected three well-established CNN architectures (ResNet-50, DenseNet-121, and VGG-19) to cover a range of learning and feature extraction scenarios. Although our results indicate that the efficacy of these fine-tuning methods significantly varies depending on both the architecture and the medical imaging type, strategies such as combining Linear Probing with Full Fine-tuning resulted in notable improvements in over 50% of the evaluated cases, demonstrating general effectiveness across medical domains. Moreover, Auto-RGN, which dynamically adjusts learning rates, led to performance enhancements of up to 11% for specific modalities. Additionally, the DenseNet architecture showed more pronounced benefits from alternative fine-tuning approaches compared to traditional full fine-tuning. This work not only provides valuable insights for optimizing pre-trained models in medical image analysis but also suggests the potential for future research into more advanced architectures and fine-tuning methods.

RODec 16, 2020
Natural grasp intention recognition based on gaze fixation in human-robot interaction

Bo Yang, Jian Huang, Xiaolong Li et al.

Eye movement is closely related to limb actions, so it can be used to infer movement intentions. More importantly, in some cases, eye movement is the only way for paralyzed and impaired patients with severe movement disorders to communicate and interact with the environment. Despite this, eye-tracking technology still has very limited application scenarios as an intention recognition method. The goal of this paper is to achieve a natural fixation-based grasping intention recognition method, with which a user with hand movement disorders can intuitively express what tasks he/she wants to do by directly looking at the object of interest. Toward this goal, we design experiments to study the relationships of fixations in different tasks. We propose some quantitative features from these relationships and analyze them statistically. Then we design a natural method for grasping intention recognition. The experimental results prove that the accuracy of the proposed method for the grasping intention recognition exceeds 89\% on the training objects. When this method is extendedly applied to objects not included in the training set, the average accuracy exceeds 85\%. The grasping experiment in the actual environment verifies the effectiveness of the proposed method.