Mohammad Asiful Hossain

CV
6papers
158citations
Novelty48%
AI Score42

6 Papers

CVDec 4, 2025Code
From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain et al.

Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets and benchmarks aimed at improving temporal reasoning, but these have emphasized other video content, including sports, cooking, and movies. No existing benchmark focuses exclusively on the unique challenges of temporal understanding in ego-centric AD footage. To fill this gap, the Temporal Understanding in Autonomous Driving (TAD) benchmark is presented, which evaluates VLMs' ability to capture the dynamic relationships between actions in AD. TAD comprises nearly 6,000 question-answer (QA) pairs, spanning 7 human-designed tasks. In addition, an evaluation is performed that consists of 9 closed- and open-source generalist models as well as SoTA AD specialist models. When applied to TAD, current SoTA models demonstrated substandard accuracies, largely due to imperfect fine-grained motion understanding. To improve motion understanding and overall accuracy on TAD, two novel training-free solutions are proposed: Scene-CoT, that leverages Chain-of-Thought (CoT) and TCogMap, which incorporates an ego-centric temporal cognitive map. The proposed approaches are integrated with existing VLMs and improve average accuracy on TAD by up to 17.72%. By introducing TAD, benchmarking multiple SoTA models, and proposing effective enhancements, this work aims to catalyze future research on temporal understanding in AD. The benchmark and evaluation code are available at \href{https://huggingface.co/datasets/vbdai/TAD}{Hugging Face} and \href{https://github.com/vbdi/tad_bench}{Github}, respectively.

CVJan 1
CPPO: Contrastive Perception for Vision Language Policy Optimization

Ahmad Rezaei, Mohsen Gholami, Saeed Ranjbar Alvar et al.

We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tokens from reasoning tokens is difficult, requiring extra LLMs, ground-truth data, forced separation of perception from reasoning by policy model, or applying rewards indiscriminately to all output tokens. CPPO addresses this problem by detecting perception tokens via entropy shifts in the model outputs under perturbed input images. CPPO then extends the RL objective function with a Contrastive Perception Loss (CPL) that enforces consistency under information-preserving perturbations and sensitivity under information-removing ones. Experiments show that CPPO surpasses previous perception-rewarding methods, while avoiding extra models, making training more efficient and scalable.

CVOct 27, 2021
International Workshop on Continual Semi-Supervised Learning: Introduction, Benchmarks and Baselines

Ajmal Shahbaz, Salman Khan, Mohammad Asiful Hossain et al.

The aim of this paper is to formalize a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community via the IJCAI 2021 International Workshop on Continual Semi-Supervised Learning (CSSL-IJCAI), with the aim of raising field awareness about this problem and mobilizing its effort in this direction. After a formal definition of continual semi-supervised learning and the appropriate training and testing protocols, the paper introduces two new benchmarks specifically designed to assess CSSL on two important computer vision tasks: activity recognition and crowd counting. We describe the Continual Activity Recognition (CAR) and Continual Crowd Counting (CCC) challenges built upon those benchmarks, the baseline models proposed for the challenges, and describe a simple CSSL baseline which consists in applying batch self-training in temporal sessions, for a limited number of rounds. The results show that learning from unlabelled data streams is extremely challenging, and stimulate the search for methods that can encode the dynamics of the data stream.

CVMar 5, 2019
Crowd Counting Using Scale-Aware Attention Networks

Mohammad Asiful Hossain, Mehrdad Hosseinzadeh, Omit Chanda et al.

In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each pixel value in the density map corresponds to the crowd density at the corresponding location in the image. Given the estimated density map, the final crowd count can be obtained by summing over all values in the density map. One challenge of crowd counting is the scale variation in images. In this work, we propose a novel scale-aware attention network to address this challenge. Using the attention mechanism popular in recent deep learning architectures, our model can automatically focus on certain global and local scales appropriate for the image. By combining these global and local scale attention, our model outperforms other state-of-the-art methods for crowd counting on several benchmark datasets.

CVAug 20, 2017
An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

Mohammad Asiful Hossain, Abdul Kawsar Tushar, Shofiullah Babor

Corner detection is a vital operation in numerous computer vision applications. The Chord-to-Point Distance Accumulation (CPDA) detector is recognized as the contour-based corner detector producing the lowest localization error while localizing corners in an image. However, in our experiment part, we demonstrate that CPDA detector often misses some potential corners. Moreover, the detection algorithm of CPDA is computationally costly. In this paper, We focus on reducing localization error as well as increasing average repeatability. The preprocessing and refinements steps of proposed process are similar to CPDA. Our experimental results will show the effectiveness and robustness of proposed process over CPDA.

CVFeb 16, 2017
Chord Angle Deviation using Tangent (CADT), an Efficient and Robust Contour-based Corner Detector

Mohammad Asiful Hossain, Abdul Kawsar Tushar

Detection of corner is the most essential process in a large number of computer vision and image processing applications. We have mentioned a number of popular contour-based corner detectors in our paper. Among all these detectors chord to triangular arm angle (CTAA) has been demonstrated as the most dominant corner detector in terms of average repeatability. We introduce a new effective method to calculate the value of curvature in this paper. By demonstrating experimental results, our proposed technique outperforms CTAA and other detectors mentioned in this paper. The results exhibit that our proposed method is simple yet efficient at finding out corners more accurately and reliably.