Mariano Rivera

CV
h-index3
7papers
7citations
Novelty54%
AI Score29

7 Papers

CVFeb 20, 2023
Hadamard Layer to Improve Semantic Segmentation

Angello Hoyos, Mariano Rivera

The Hadamard Layer, a simple and computationally efficient way to improve results in semantic segmentation tasks, is presented. This layer has no free parameters that require to be trained. Therefore it does not increase the number of model parameters, and the extra computational cost is marginal. Experimental results show that the new Hadamard layer substantially improves the performance of the investigated models (variants of the Pix2Pix model). The performance's improvement can be explained by the Hadamard layer forcing the network to produce an internal encoding of the classes so that all bins are active. Therefore, the network computation is more distributed. In a sort that the Hadamard layer requires that to change the predicted class, it is necessary to modify $2^{k-1}$ bins, assuming $k$ bins in the encoding. A specific loss function allows a stable and fast training convergence.

CVSep 20, 2023
Attentive VQ-VAE

Angello Hoyos, Mariano Rivera

We present a novel approach to enhance the capabilities of VQ-VAE models through the integration of a Residual Encoder and a Residual Pixel Attention layer, named Attentive Residual Encoder (AREN). The objective of our research is to improve the performance of VQ-VAE while maintaining practical parameter levels. The AREN encoder is designed to operate effectively at multiple levels, accommodating diverse architectural complexities. The key innovation is the integration of an inter-pixel auto-attention mechanism into the AREN encoder. This approach allows us to efficiently capture and utilize contextual information across latent vectors. Additionally, our models uses additional encoding levels to further enhance the model's representational power. Our attention layer employs a minimal parameter approach, ensuring that latent vectors are modified only when pertinent information from other pixels is available. Experimental results demonstrate that our proposed modifications lead to significant improvements in data representation and generation, making VQ-VAEs even more suitable for a wide range of applications as the presented.

LGSep 22, 2023
How to train your VAE

Mariano Rivera

Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization. Meanwhile, the KL Divergence enforces alignment between latent variable distributions and a prior imposing a structure on the overall latent space but leaves individual variable distributions unconstrained. The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. Implementation details involve ResNetV2 architectures for both the Encoder and Decoder. The experiments demonstrate the ability to generate realistic faces, offering a promising solution for enhancing VAE-based generative models.

CVOct 2, 2023
EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution

Esteban Reyes-Saldana, Mariano Rivera

Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.

CVMay 23, 2025
COLORA: Efficient Fine-Tuning for Convolutional Models with a Study Case on Optical Coherence Tomography Image Classification

Mariano Rivera, Angello Hoyos

We introduce CoLoRA (Convolutional Low-Rank Adaptation), a parameter-efficient fine-tuning method for convolutional neural networks (CNNs). CoLoRA extends LoRA to convolutional layers by decomposing kernel updates into lightweight depthwise and pointwise components.This design reduces the number of trainable parameters to 0.2 compared to conventional fine-tuning, preserves the original model size, and allows merging updates into the pretrained weights after each epoch, keeping inference complexity unchanged. On OCTMNISTv2, CoLoRA applied to VGG16 and ResNet50 achieves up to 1 percent accuracy and 0.013 AUC improvements over strong baselines (Vision Transformers, state-space, and Kolmogorov Arnold models) while reducing per-epoch training time by nearly 20 percent. Results indicate that CoLoRA provides a stable and effective alternative to full fine-tuning for medical image classification.

IVMar 19, 2021
AxonNet: A self-supervised Deep Neural Network for Intravoxel Structure Estimation from DW-MRI

Hanna Ehrlich, Mariano Rivera

We present a method for estimating intravoxel parameters from a DW-MRI based on deep learning techniques. We show that neural networks (DNNs) have the potential to extract information from diffusion-weighted signals to reconstruct cerebral tracts. We present two DNN models: one that estimates the axonal structure in the form of a voxel and the other to calculate the structure of the central voxel using the voxel neighborhood. Our methods are based on a proposed parameter representation suitable for the problem. Since it is practically impossible to have real tagged data for any acquisition protocol, we used a self-supervised strategy. Experiments with synthetic data and real data show that our approach is competitive, and the computational times show that our approach is faster than the SOTA methods, even if training times are considered. This computational advantage increases if we consider the prediction of multiple images with the same acquisition protocol.

IVJun 14, 2019
Deep neural network for fringe pattern filtering and normalisation

Alan Reyes-Figueroa, Mariano Rivera

We propose a new framework for processing Fringe Patterns (FP). Our novel approach builds upon the hypothesis that the denoising and normalisation of FPs can be learned by a deep neural network if enough pairs of corrupted and ideal FPs are provided. The main contributions of this paper are the following: (1) We propose the use of the U-net neural network architecture for FP normalisation tasks; (2) we propose a modification for the distribution of weights in the U-net, called here the V-net model, which is more convenient for reconstruction tasks, and we conduct extensive experimental evidence in which the V-net produces high-quality results for FP filtering and normalisation. (3) We also propose two modifications of the V-net scheme, namely, a residual version called ResV-net and a fast operating version of the V-net, to evaluate the potential improvements when modify our proposal. We evaluate the performance of our methods in various scenarios: FPs corrupted with different degrees of noise, and corrupted with different noise distributions. We compare our methodology versus other state-of-the-art methods. The experimental results (on both synthetic and real data) demonstrate the capabilities and potential of this new paradigm for processing interferograms.