David Castells-Rufas

CV
h-index11
6papers
34citations
Novelty41%
AI Score41

6 Papers

IVJan 1Code
MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation

Le-Anh Tran, Chung Nguyen Tran, Nhan Cach Dang et al.

Semantic segmentation is crucial for medical image analysis, enabling precise disease diagnosis and treatment planning. However, many advanced models employ complex architectures, limiting their use in resource-constrained clinical settings. This paper proposes MFEnNet, an efficient medical image segmentation framework that incorporates MetaFormer in the encoding phase of the U-Net backbone. MetaFormer, an architectural abstraction of vision transformers, provides a versatile alternative to convolutional neural networks by transforming tokenized image patches into sequences for global context modeling. To mitigate the substantial computational cost associated with self-attention, the proposed framework replaces conventional transformer modules with pooling transformer blocks, thereby achieving effective global feature aggregation at reduced complexity. In addition, Swish activation is used to achieve smoother gradients and faster convergence, while spatial pyramid pooling is incorporated at the bottleneck to improve multi-scale feature extraction. Comprehensive experiments on different medical segmentation benchmarks demonstrate that the proposed MFEnNet approach attains competitive accuracy while significantly lowering computational cost compared to state-of-the-art models. The source code for this work is available at https://github.com/tranleanh/mfennet.

CVApr 25, 2022
BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation

Juan Borrego-Carazo, Carles Sánchez, David Castells-Rufas et al.

Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions. In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy.

CVJul 4, 2025Code
Low-Light Enhancement via Encoder-Decoder Network with Illumination Guidance

Le-Anh Tran, Chung Nguyen Tran, Ngoc-Luu Nguyen et al.

This paper introduces a novel deep learning framework for low-light image enhancement, named the Encoder-Decoder Network with Illumination Guidance (EDNIG). Building upon the U-Net architecture, EDNIG integrates an illumination map, derived from Bright Channel Prior (BCP), as a guidance input. This illumination guidance helps the network focus on underexposed regions, effectively steering the enhancement process. To further improve the model's representational power, a Spatial Pyramid Pooling (SPP) module is incorporated to extract multi-scale contextual features, enabling better handling of diverse lighting conditions. Additionally, the Swish activation function is employed to ensure smoother gradient propagation during training. EDNIG is optimized within a Generative Adversarial Network (GAN) framework using a composite loss function that combines adversarial loss, pixel-wise mean squared error (MSE), and perceptual loss. Experimental results show that EDNIG achieves competitive performance compared to state-of-the-art methods in quantitative metrics and visual quality, while maintaining lower model complexity, demonstrating its suitability for real-world applications. The source code for this work is available at https://github.com/tranleanh/ednig.

COMP-PHNov 23, 2024
Capacitive Touch Sensor Modeling With a Physics-informed Neural Network and Maxwell's Equations

Ganyong Mo, Krishna Kumar Narayanan, David Castells-Rufas et al.

Maxwell's equations are the fundamental equations for understanding electric and magnetic field interactions and play a crucial role in designing and optimizing sensor systems like capacitive touch sensors, which are widely prevalent in automotive switches and smartphones. Ensuring robust functionality and stability of the sensors in dynamic environments necessitates profound domain expertise and computationally intensive multi-physics simulations. This paper introduces a novel approach using a Physics-Informed Neural Network (PINN) based surrogate model to accelerate the design process. The PINN model solves the governing electrostatic equations describing the interaction between a finger and a capacitive sensor. Inputs include spatial coordinates from a 3D domain encompassing the finger, sensor, and PCB, along with finger distances. By incorporating the electrostatic equations directly into the neural network's loss function, the model captures the underlying physics. The learned model thus serves as a surrogate sensor model on which inference can be carried out in seconds for different experimental setups without the need to run simulations. Efficacy results evaluated on unseen test cases demonstrate the significant potential of PINNs in accelerating the development and design optimization of capacitive touch sensors.

CVMar 8, 2019
OpenCL-based FPGA accelerator for disparity map generation with stereoscopic event cameras

David Castells-Rufas, Jordi Carrabina

Although event-based cameras are already commercially available. Vision algorithms based on them are still not common. As a consequence, there are few Hardware Accelerators for them. In this work we present some experiments to create FPGA accelerators for a well-known vision algorithm using event-based cameras. We present a stereo matching algorithm to create a stream of disparity events disparity map and implement several accelerators using the Intel FPGA OpenCL tool-chain. The results show that multiple designs can be easily tested and that a performance speedup of more than 8x can be achieved with simple code transformations.

CVJan 12, 2018
A High-Performance HOG Extractor on FPGA

Vinh Ngo, Arnau Casadevall, Marc Codina et al.

Pedestrian detection is one of the key problems in emerging self-driving car industry. And HOG algorithm has proven to provide good accuracy for pedestrian detection. There are plenty of research works have been done in accelerating HOG algorithm on FPGA because of its low-power and high-throughput characteristics. In this paper, we present a high-performance HOG architecture for pedestrian detection on a low-cost FPGA platform. It achieves a maximum throughput of 526 FPS with 640x480 input images, which is 3.25 times faster than the state of the art design. The accelerator is integrated with SVM-based prediction in realizing a pedestrian detection system. And the power consumption of the whole system is comparable with the best existing implementations.