Ryo Watanabe

CV
h-index12
3papers
6citations
Novelty28%
AI Score34

3 Papers

47.6ROMay 13
When Absolute State Fails: Evaluating Proprioceptive Encodings for Robust Manipulation

Maxime Alvarez, Ryo Watanabe, Paul Crook et al.

As end-to-end robotic policies are progressively deployed in the real world to solve real tasks, they face a gap between the training and inference conditions. Scaling the amount and diversity of the training data has shown some success in improving zero-shot generalization, yet robots still fail when faced with new, unseen test conditions. For instance, while robots with fixed frames of reference are common, those with moving frames pose a greater challenge for deployment. To address this specific instance of the issue, we present a study of strategies for encoding the robot's proprioceptive state to improve both in- and out-of-distribution performance at test time. Through a systematic study of joint representations, we find that a simple episode-wise relative frame provides the best trade-off between task performance and robustness, outperforming the baselines in extensive real-robot experiments conducted in a realistic test environment. The results suggest a practical path to leveraging data collected by robots with varying frames of reference and deployment to unseen test configurations.

CVDec 4, 2023
Fully Spiking Denoising Diffusion Implicit Models

Ryo Watanabe, Yusuke Mukuta, Tatsuya Harada

Spiking neural networks (SNNs) have garnered considerable attention owing to their ability to run on neuromorphic devices with super-high speeds and remarkable energy efficiencies. SNNs can be used in conventional neural network-based time- and energy-consuming applications. However, research on generative models within SNNs remains limited, despite their advantages. In particular, diffusion models are a powerful class of generative models, whose image generation quality surpass that of the other generative models, such as GANs. However, diffusion models are characterized by high computational costs and long inference times owing to their iterative denoising feature. Therefore, we propose a novel approach fully spiking denoising diffusion implicit model (FSDDIM) to construct a diffusion model within SNNs and leverage the high speed and low energy consumption features of SNNs via synaptic current learning (SCL). SCL fills the gap in that diffusion models use a neural network to estimate real-valued parameters of a predefined probabilistic distribution, whereas SNNs output binary spike trains. The SCL enables us to complete the entire generative process of diffusion models exclusively using SNNs. We demonstrate that the proposed method outperforms the state-of-the-art fully spiking generative model.

IROct 15, 2025
Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models

Yuki Yada, Sho Akiyama, Ryo Watanabe et al.

On large-scale e-commerce platforms with tens of millions of active monthly users, recommending visually similar products is essential for enabling users to efficiently discover items that align with their preferences. This study presents the application of a vision-language model (VLM) -- which has demonstrated strong performance in image recognition and image-text retrieval tasks -- to product recommendations on Mercari, a major consumer-to-consumer marketplace used by more than 20 million monthly users in Japan. Specifically, we fine-tuned SigLIP, a VLM employing a sigmoid-based contrastive loss, using one million product image-title pairs from Mercari collected over a three-month period, and developed an image encoder for generating item embeddings used in the recommendation system. Our evaluation comprised an offline analysis of historical interaction logs and an online A/B test in a production environment. In offline analysis, the model achieved a 9.1% improvement in nDCG@5 compared with the baseline. In the online A/B test, the click-through rate improved by 50% whereas the conversion rate improved by 14% compared with the existing model. These results demonstrate the effectiveness of VLM-based encoders for e-commerce product recommendations and provide practical insights into the development of visual similarity-based recommendation systems.