Basim Azam

h-index4

5papers

7citations

Novelty50%

AI Score38

Ranked #86,537 of 194,257 authors (top 45%)#29,142 in CV (top 49%)

5 Papers

1.4CVApr 13, 2022

Deep Learning Model with GA based Feature Selection and Context Integration

Ranju Mandal, Basim Azam, Brijesh Verma et al.

Deep learning models have been very successful in computer vision and image processing applications. Since its inception, Many top-performing methods for image segmentation are based on deep CNN models. However, deep CNN models fail to integrate global and local context alongside visual features despite having complex multi-layer architectures. We propose a novel three-layered deep learning model that assiminlate or learns independently global and local contextual information alongside visual features. The novelty of the proposed model is that One-vs-All binary class-based learners are introduced to learn Genetic Algorithm (GA) optimized features in the visual layer, followed by the contextual layer that learns global and local contexts of an image, and finally the third layer integrates all the information optimally to obtain the final class label. Stanford Background and CamVid benchmark image parsing datasets were used for our model evaluation, and our model shows promising results. The empirical analysis reveals that optimized visual features with global and local contextual information play a significant role to improve accuracy and produce stable predictions comparable to state-of-the-art deep CNN models.

1.4CVApr 13, 2022

Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing

Ranju Mandal, Basim Azam, Brijesh Verma

Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.

13.1CVMar 19, 2025Code

GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector

Zechuan Li, Hongshan Yu, Yihao Ding et al.

We propose GO-N3RDet, a scene-geometry optimized multi-view 3D object detector enhanced by neural radiance fields. The key to accurate 3D object detection is in effective voxel representation. However, due to occlusion and lack of 3D information, constructing 3D features from multi-view 2D images is challenging. Addressing that, we introduce a unique 3D positional information embedded voxel optimization mechanism to fuse multi-view features. To prioritize neural field reconstruction in object regions, we also devise a double importance sampling scheme for the NeRF branch of our detector. We additionally propose an opacity optimization module for precise voxel opacity prediction by enforcing multi-view consistency constraints. Moreover, to further improve voxel density consistency across multiple perspectives, we incorporate ray distance as a weighting factor to minimize cumulative ray errors. Our unique modules synergetically form an end-to-end neural model that establishes new state-of-the-art in NeRF-based multi-view 3D detection, verified with extensive experiments on ScanNet and ARKITScenes. Code will be available at https://github.com/ZechuanLi/GO-N3RDet.

3.6CVSep 29, 2025

GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh et al.

Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the image. This weakness of MLLMs is currently studied using static benchmarks with fixed visual scenarios, which preempts the possibility of uncovering model-specific or unanticipated hallucination vulnerabilities. We introduce GHOST (Generating Hallucinations via Optimizing Stealth Tokens), a method designed to stress-test MLLMs by actively generating images that induce hallucination. GHOST is fully automatic and requires no human supervision or prior knowledge. It operates by optimizing in the image embedding space to mislead the model while keeping the target object absent, and then guiding a diffusion model conditioned on the embedding to generate natural-looking images. The resulting images remain visually natural and close to the original input, yet introduce subtle misleading cues that cause the model to hallucinate. We evaluate our method across a range of models, including reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success rate exceeding 28%, compared to around 1% in prior data-driven discovery methods. We confirm that the generated images are both high-quality and object-free through quantitative metrics and human evaluation. Also, GHOST uncovers transferable vulnerabilities: images optimized for Qwen2.5-VL induce hallucinations in GPT-4o at a 66.5% rate. Finally, we show that fine-tuning on our images mitigates hallucination, positioning GHOST as both a diagnostic and corrective tool for building more reliable multimodal systems.

2.4IVSep 29, 2021

Multipath CNN with alpha matte inference for knee tissue segmentation from MRI

Sheheryar Khan, Basim Azam, Yongcheng Yao et al.

Precise segmentation of knee tissues from magnetic resonance imaging (MRI) is critical in quantitative imaging and diagnosis. Convolutional neural networks (CNNs), which are state of the art, have limitations owing to the lack of image-specific adaptation, such as low tissue contrasts and structural inhomogeneities, thereby leading to incomplete segmentation results. This paper presents a deep learning based automatic segmentation framework for knee tissue segmentation. A novel multipath CNN-based method is proposed, which consists of an encoder decoder-based segmentation network in combination with a low rank tensor-reconstructed segmentation network. Low rank reconstruction in MRI tensor sub-blocks is introduced to exploit the structural and morphological variations in knee tissues. To further improve the segmentation from CNNs, trimap generation, which effectively utilizes superimposed regions, is proposed for defining high, medium and low confidence regions from the multipath CNNs. The secondary path with low rank reconstructed input mitigates the conditions in which the primary segmentation network can potentially fail and overlook the boundary regions. The outcome of the segmentation is solved as an alpha matting problem by blending the trimap with the source input. Experiments on Osteoarthritis Initiative (OAI) datasets and a self prepared scan validate the effectiveness of the proposed method. We specifically demonstrate the application of the proposed method in a cartilage segmentation based thickness map for diagnosis purposes.