CVAISep 26, 2021

ViT Cane: Visual Assistant for the Visually Impaired

arXiv:2109.13857v12 citations
Originality Synthesis-oriented
AI Analysis

This addresses navigation challenges for blind and visually impaired individuals, but it is incremental as it applies an existing vision transformer method to a specific domain.

The paper tackled obstacle detection for visually impaired navigation by proposing ViT Cane, a system using a vision transformer model, which achieved higher performance on the COCO dataset compared to CNN-based models.

Blind and visually challenged face multiple issues with navigating the world independently. Some of these challenges include finding the shortest path to a destination and detecting obstacles from a distance. To tackle this issue, this paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time. Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors. Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain and is designed to be easily reproduced. The paper discusses the utility of a Visual Transformer model in comparison to other CNN based models for this specific application. Through rigorous testing, the proposed obstacle detection model has achieved higher performance on the Common Object in Context (COCO) data set than its CNN counterpart. Comprehensive field tests were conducted to verify the effectiveness of our system for holistic indoor understanding and obstacle avoidance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes