Syed Masum Billah

h-index15

3papers

9citations

Novelty42%

AI Score25

Ranked #165,927 of 194,257 authors (top 85%)#53,268 in CV (top 90%)

3 Papers

7.6CVAug 23, 2024

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce et al.

This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.

3.6CVMay 28, 2025

IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth

Md Touhidul Islam, Imran Kabir, Md Alimoor Reza et al.

We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to evaluate model reliability. IKIWISI introduces "spy objects": adversarial instances users know are absent, to discern models hallucinating on nonexistent items. The tool functions as a cognitive audit mechanism, surfacing mismatches between human and machine perception by visualizing where models diverge from human understanding. Our study with 15 participants found that users considered IKIWISI easy to use, made assessments that correlated with objective metrics when available, and reached informed conclusions by examining only a small fraction of heatmap cells. This approach not only complements traditional evaluation methods through visual assessment of model behavior with custom object sets, but also reveals opportunities for improving alignment between human perception and machine understanding in vision-language systems.

2.9HCFeb 3, 2022

Feasibility of Interactive 3D Map for Remote Sighted Assistance

Jingyi Xie, Rui Yu, Sooyeon Lee et al.

Remote sighted assistance (RSA) has emerged as a conversational assistive technology, where remote sighted workers, i.e., agents, provide real-time assistance to users with vision impairments via video-chat-like communication. Researchers found that agents' lack of environmental knowledge, the difficulty of orienting users in their surroundings, and the inability to estimate distances from users' camera feeds are key challenges to sighted agents. To address these challenges, researchers have suggested assisting agents with computer vision technologies, especially 3D reconstruction. This paper presents a high-fidelity prototype of such an RSA, where agents use interactive 3D maps with localization capability. We conducted a walkthrough study with thirteen agents and one user with simulated vision impairment using this prototype. The study revealed that, compared to baseline RSA, the agents were significantly faster in providing navigational assistance to users, and their mental workload was significantly reduced -- all indicate the feasibility and prospect of 3D maps in RSA.