CVDec 9, 2022
SupeRVol: Super-Resolution Shape and Reflectance Estimation in Inverse Volume RenderingMohammed Brahimi, Bjoern Haefner, Tarun Yenamandra et al.
We propose an end-to-end inverse rendering pipeline called SupeRVol that allows us to recover 3D shape and material parameters from a set of color images in a super-resolution manner. To this end, we represent both the bidirectional reflectance distribution function (BRDF) and the signed distance function (SDF) by multi-layer perceptrons. In order to obtain both the surface shape and its reflectance properties, we revert to a differentiable volume renderer with a physically based illumination model that allows us to decouple reflectance and lighting. This physical model takes into account the effect of the camera's point spread function thereby enabling a reconstruction of shape and material in a super-resolution quality. Experimental validation confirms that SupeRVol achieves state of the art performance in terms of inverse rendering quality. It generates reconstructions that are sharper than the individual input images, making this method ideally suited for 3D modeling from low-resolution imagery.
CVMar 29, 2024
Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric StereoMohammed Brahimi, Bjoern Haefner, Zhenzhang Ye et al.
Neural approaches have shown a significant progress on camera-based reconstruction. But they require either a fairly dense sampling of the viewing sphere, or pre-training on an existing dataset, thereby limiting their generalizability. In contrast, photometric stereo (PS) approaches have shown great potential for achieving high-quality reconstruction under sparse viewpoints. Yet, they are impractical because they typically require tedious laboratory conditions, are restricted to dark rooms, and often multi-staged, making them subject to accumulated errors. To address these shortcomings, we propose an end-to-end uncalibrated multi-view PS framework for reconstructing high-resolution shapes acquired from sparse viewpoints in a real-world environment. We relax the dark room assumption, and allow a combination of static ambient lighting and dynamic near LED lighting, thereby enabling easy data capture outside the lab. Experimental validation confirms that it outperforms existing baseline approaches in the regime of sparse viewpoints by a large margin. This allows to bring high-accuracy 3D reconstruction from the dark room to the real world, while maintaining a reasonable data capture complexity.
LGApr 24, 2025
Combining GCN Structural Learning with LLM Chemical Knowledge for Enhanced Virtual ScreeningRadia Berreziga, Mohammed Brahimi, Khairedine Kraim et al.
Virtual screening plays a critical role in modern drug discovery by enabling the identification of promising candidate molecules for experimental validation. Traditional machine learning methods such, as Support Vector Machines (SVM) and XGBoost, rely on predefined molecular representations, often leading to information loss and potential bias. In contrast, deep learning approaches-particularly Graph Convolutional Networks (GCNs)-offer a more expressive and unbiased alternative by operating directly on molecular graphs. Meanwhile, Large Language Models (LLMs) have recently demonstrated state-of-the-art performance in drug design, thanks to their capacity to capture complex chemical patterns from large-scale data via attention mechanisms. In this paper, we propose a hybrid architecture that integrates GCNs with LLM-derived embeddings to combine localized structural learning with global chemical knowledge. The LLM embeddings can be precomputed and stored in a molecular feature library, removing the need to rerun the LLM during training or inference and thus maintaining computational efficiency. We found that concatenating the LLM embeddings after each GCN layer-rather than only at the final layer-significantly improves performance, enabling deeper integration of global context throughout the network. The resulting model achieves superior results, with an F1-score of (88.8\%), outperforming standalone GCN (87.9%), XGBoost (85.5%), and SVM (85.4%) baselines.
71.5CVMar 31
GRVS: a Generalizable and Recurrent Approach to Monocular Dynamic View SynthesisThomas Tanay, Mohammed Brahimi, Michal Nazarczuk et al.
Synthesizing novel views from monocular videos of dynamic scenes remains a challenging problem. Scene-specific methods that optimize 4D representations with explicit motion priors often break down in highly dynamic regions where multi-view information is hard to exploit. Diffusion-based approaches that integrate camera control into large pre-trained models can produce visually plausible videos but frequently suffer from geometric inconsistencies across both static and dynamic areas. Both families of methods also require substantial computational resources. Building on the success of generalizable models for static novel view synthesis, we adapt the framework to dynamic inputs and propose a new model with two key components: (1) a recurrent loop that enables unbounded and asynchronous mapping between input and target videos and (2) an efficient use of plane sweeps over dynamic inputs to disentangle camera and scene motion, and achieve fine-grained, six-degrees-of-freedom camera controls. We train and evaluate our model on the UCSD dataset and on Kubric-4D-dyn, a new monocular dynamic dataset featuring longer, higher resolution sequences with more complex scene dynamics than existing alternatives. Our model outperforms four Gaussian Splatting-based scene-specific approaches, as well as two diffusion-based approaches in reconstructing fine-grained geometric details across both static and dynamic regions.
CVMar 7
AgrI Challenge: A Data-Centric AI Competition for Cross-Team Validation in Agricultural VisionMohammed Brahimi, Karim Laabassi, Mohamed Seghir Hadj Ameur et al.
Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.
CVMar 8
Evaluating Synthetic Data for Baggage Trolley Detection in Airport LogisticsAbdeldjalil Taibi, Mohmoud Badlis, Amina Bensalem et al.
Efficient luggage trolley management is critical for reducing congestion and ensuring asset availability in modern airports. Automated detection systems face two main challenges. First, strict security and privacy regulations limit large-scale data collection. Second, existing public datasets lack the diversity, scale, and annotation quality needed to handle dense, overlapping trolley arrangements typical of real-world operations. To address these limitations, we introduce a synthetic data generation pipeline based on a high-fidelity Digital Twin of Algiers International Airport using NVIDIA Omniverse. The pipeline produces richly annotated data with oriented bounding boxes, capturing complex trolley formations, including tightly nested chains. We evaluate YOLO-OBB using five training strategies: real-only, synthetic-only, linear probing, full fine-tuning, and mixed training. This allows us to assess how synthetic data can complement limited real-world annotations. Our results show that mixed training with synthetic data and only 40 percent of real annotations matches or exceeds the full real-data baseline, achieving 0.94 mAP@50 and 0.77 mAP@50-95, while reducing annotation effort by 25 to 35 percent. Multi-seed experiments confirm strong reproducibility with a standard deviation below 0.01 on mAP@50, demonstrating the practical effectiveness of synthetic data for automated trolley detection.
CVNov 24, 2025
Rethinking Plant Disease Diagnosis: Bridging the Academic-Practical Gap with Vision Transformers and Zero-Shot LearningWassim Benabbas, Mohammed Brahimi, Samir Akhrouf et al.
Recent advances in deep learning have enabled significant progress in plant disease classification using leaf images. Much of the existing research in this field has relied on the PlantVillage dataset, which consists of well-centered plant images captured against uniform, uncluttered backgrounds. Although models trained on this dataset achieve high accuracy, they often fail to generalize to real-world field images, such as those submitted by farmers to plant diagnostic systems. This has created a significant gap between published studies and practical application requirements, highlighting the necessity of investigating and addressing this issue. In this study, we investigate whether attention-based architectures and zero-shot learning approaches can bridge the gap between curated academic datasets and real-world agricultural conditions in plant disease classification. We evaluate three model categories: Convolutional Neural Networks (CNNs), Vision Transformers, and Contrastive Language-Image Pre-training (CLIP)-based zero-shot models. While CNNs exhibit limited robustness under domain shift, Vision Transformers demonstrate stronger generalization by capturing global contextual features. Most notably, CLIP models classify diseases directly from natural language descriptions without any task-specific training, offering strong adaptability and interpretability. These findings highlight the potential of zero-shot learning as a practical and scalable domain adaptation strategy for plant health diagnosis in diverse field environments.
CVMar 12, 2025
Better Together: Unified Motion Capture and 3D Avatar ReconstructionArthur Moreau, Mohammed Brahimi, Richard Shaw et al.
We present Better Together, a method that simultaneously solves the human pose estimation problem while reconstructing a photorealistic 3D human avatar from multi-view videos. While prior art usually solves these problems separately, we argue that joint optimization of skeletal motion with a 3D renderable body model brings synergistic effects, i.e. yields more precise motion capture and improved visual quality of real-time rendering of avatars. To achieve this, we introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh and propose to optimize the motion sequence with time-dependent MLPs that provide accurate and temporally consistent pose estimates. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation, reducing error by 35% on body joints and 45% on hand joints compared to keypoint-based methods. At the same time, our method significantly boosts the visual quality of animatable avatars (+2dB PSNR on novel view synthesis) on diverse challenging subjects.
CVMay 31, 2023
Exploring Regions of Interest: Visualizing Histological Image Classification for Breast Cancer using Deep LearningImane Nedjar, Mohammed Brahimi, Said Mahmoudi et al.
Computer aided detection and diagnosis systems based on deep learning have shown promising performance in breast cancer detection. However, there are cases where the obtained results lack justification. In this study, our objective is to highlight the regions of interest used by a convolutional neural network (CNN) for classifying histological images as benign or malignant. We compare these regions with the regions identified by pathologists. To achieve this, we employed the VGG19 architecture and tested three visualization methods: Gradient, LRP Z, and LRP Epsilon. Additionally, we experimented with three pixel selection methods: Bins, K-means, and MeanShift. Based on the results obtained, the Gradient visualization method and the MeanShift selection method yielded satisfactory outcomes for visualizing the images.
CVNov 17, 2019
On the well-posedness of uncalibrated photometric stereo under general lightingMohammed Brahimi, Yvain Quéau, Bjoern Haefner et al.
Uncalibrated photometric stereo aims at estimating the 3D-shape of a surface, given a set of images captured from the same viewing angle, but under unknown, varying illumination. While the theoretical foundations of this inverse problem under directional lighting are well-established, there is a lack of mathematical evidence for the uniqueness of a solution under general lighting. On the other hand, stable and accurate heuristical solutions of uncalibrated photometric stereo under such general lighting have recently been proposed. The quality of the results demonstrated therein tends to indicate that the problem may actually be well-posed, but this still has to be established. The present paper addresses this theoretical issue, considering first-order spherical harmonics approximation of general lighting. Two important theoretical results are established. First, the orthographic integrability constraint ensures uniqueness of a solution up to a global concave-convex ambiguity, which had already been conjectured, yet not proven. Second, the perspective integrability constraint makes the problem well-posed, which generalizes a previous result limited to directional lighting. Eventually, a closed-form expression for the unique least-squares solution of the problem under perspective projection is provided, allowing numerical simulations on synthetic data to empirically validate our findings.
CVMay 31, 2019
Deep interpretable architecture for plant diseases classificationMohammed Brahimi, Said Mahmoudi, Kamel Boukhalfa et al.
Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images.