Rafia Rahim

CV
h-index58
7papers
303citations
Novelty52%
AI Score43

7 Papers

CVApr 14
4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview

Benjamin Kiefer, Jan Lukas Augustin, Jon Muhovič et al.

The 4th Workshop on Maritime Computer Vision (MaCVi) is organized as part of CVPR 2026. This edition features five benchmark challenges with emphasis on both predictive accuracy and embedded real-time feasibility. This report summarizes the MaCVi 2026 challenge setup, evaluation protocols, datasets, and benchmark tracks, and presents quantitative results, qualitative comparisons, and cross-challenge analyses of emerging method trends. We also include technical reports from top-performing teams to highlight practical design choices and lessons learned across the benchmark suite. Datasets, leaderboards, and challenge resources are available at https://macvi.org/workshop/cvpr26.

CVAug 22, 2021Code
MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

Faranak Shamsafar, Samuel Woerz, Rafia Rahim et al.

Recent methods in stereo matching have continuously improved the accuracy using deep models. This gain, however, is attained with a high increase in computation cost, such that the network may not fit even on a moderate GPU. This issue raises problems when the model needs to be deployed on resource-limited devices. For this, we propose two light models for stereo vision with reduced complexity and without sacrificing accuracy. Depending on the dimension of cost volume, we design a 2D and a 3D model with encoder-decoders built from 2D and 3D convolutions, respectively. To this end, we leverage 2D MobileNet blocks and extend them to 3D for stereo vision application. Besides, a new cost volume is proposed to boost the accuracy of the 2D model, making it performing close to 3D networks. Experiments show that the proposed 2D/3D networks effectively reduce the computational expense (27%/95% and 72%/38% fewer parameters/operations in 2D and 3D models, respectively) while upholding the accuracy. Our code is available at https://github.com/cogsys-tuebingen/mobilestereonet.

CVMar 24, 2025
LeanStereo: A Leaner Backbone based Stereo Network

Rafia Rahim, Samuel Woerz, Andreas Zell

Recently, end-to-end deep networks based stereo matching methods, mainly because of their performance, have gained popularity. However, this improvement in performance comes at the cost of increased computational and memory bandwidth requirements, thus necessitating specialized hardware (GPUs); even then, these methods have large inference times compared to classical methods. This limits their applicability in real-world applications. Although we desire high accuracy stereo methods albeit with reasonable inference time. To this end, we propose a fast end-to-end stereo matching method. Majority of this speedup comes from integrating a leaner backbone. To recover the performance lost because of a leaner backbone, we propose to use learned attention weights based cost volume combined with LogL1 loss for stereo matching. Using LogL1 loss not only improves the overall performance of the proposed network but also leads to faster convergence. We do a detailed empirical evaluation of different design choices and show that our method requires 4x less operations and is also about 9 to 14x faster compared to the state of the art methods like ACVNet [1], LEAStereo [2] and CFNet [3] while giving comparable performance.

CVMar 24, 2025
Distilling Stereo Networks for Performant and Efficient Leaner Networks

Rafia Rahim, Samuel Woerz, Andreas Zell

Knowledge distillation has been quite popular in vision for tasks like classification and segmentation however not much work has been done for distilling state-of-the-art stereo matching methods despite their range of applications. One of the reasons for its lack of use in stereo matching networks is due to the inherent complexity of these networks, where a typical network is composed of multiple two- and three-dimensional modules. In this work, we systematically combine the insights from state-of-the-art stereo methods with general knowledge-distillation techniques to develop a joint framework for stereo networks distillation with competitive results and faster inference. Moreover, we show, via a detailed empirical analysis, that distilling knowledge from the stereo network requires careful design of the complete distillation pipeline starting from backbone to the right selection of distillation points and corresponding loss functions. This results in the student networks that are not only leaner and faster but give excellent performance . For instance, our student network while performing better than the performance oriented methods like PSMNet [1], CFNet [2], and LEAStereo [3]) on benchmark SceneFlow dataset, is 8x, 5x, and 8x faster respectively. Furthermore, compared to speed oriented methods having inference time less than 100ms, our student networks perform better than all the tested methods. In addition, our student network also shows better generalization capabilities when tested on unseen datasets like ETH3D and Middlebury.

CVAug 23, 2021
Separable Convolutions for Optimizing 3D Stereo Networks

Rafia Rahim, Faranak Shamsafar, Andreas Zell

Deep learning based 3D stereo networks give superior performance compared to 2D networks and conventional stereo methods. However, this improvement in the performance comes at the cost of increased computational complexity, thus making these networks non-practical for the real-world applications. Specifically, these networks use 3D convolutions as a major work horse to refine and regress disparities. In this work first, we show that these 3D convolutions in stereo networks consume up to 94% of overall network operations and act as a major bottleneck. Next, we propose a set of "plug-&-run" separable convolutions to reduce the number of parameters and operations. When integrated with the existing state of the art stereo networks, these convolutions lead up to 7x reduction in number of operations and up to 3.5x reduction in parameters without compromising their performance. In fact these convolutions lead to improvement in their performance in the majority of cases.

CVAug 21, 2018
Improving Super-Resolution Methods via Incremental Residual Learning

Muneeb Aadil, Rafia Rahim, Sibt ul Hussain

Recently, Convolutional Neural Networks (CNNs) have shown promising performance in super-resolution (SR). However, these methods operate primarily on Low Resolution (LR) inputs for memory efficiency but this limits, as we demonstrate, their ability to (i) model high frequency information; and (ii) smoothly translate from LR to High Resolution (HR) space. To this end, we propose a novel Incremental Residual Learning (IRL) framework to address these mentioned issues. In IRL, first we select a typical SR pre-trained network as a master branch. Next we sequentially train and add residual branches to the main branch, where each residual branch is learned to model accumulated residuals of all previous branches. We plug state of the art methods in IRL framework and demonstrate consistent performance improvement on public benchmark datasets to set a new state of the art for SR at only approximately 20% increase in training time.

MMNov 20, 2017
End-to-end Trained CNN Encode-Decoder Networks for Image Steganography

Atique ur Rehman, Rafia Rahim, M Shahroz Nadeem et al.

All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural network based encoder-decoder architecture for embedding of images as payload. To this end, we make following three major contributions: (i) we propose a deep learning based generic encoder-decoder architecture for image steganography; (ii) we introduce a new loss function that ensures joint end-to-end training of encoder-decoder networks; (iii) we perform extensive empirical evaluation of proposed architecture on a range of challenging publicly available datasets (MNIST, CIFAR10, PASCAL-VOC12, ImageNet, LFW) and report state-of-the-art payload capacity at high PSNR and SSIM values.