CV AI HC LGJul 15, 2024

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino

arXiv:2407.10870v13.74 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses a specialized task in healthcare or human-computer interaction, but it is incremental as it applies an existing model to a new domain.

The paper tackled the problem of decoding hand gestures from forearm ultrasound images using GPT-4o without fine-tuning, achieving improvements with few-shot learning, though no concrete numbers were provided.

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning.

View on arXiv PDF

Similar