Gibran Fuentes-Pineda

CV
h-index10
9papers
110citations
Novelty48%
AI Score35

9 Papers

CVOct 14, 2022
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification

Ricardo Montalvo-Lezama, Berenice Montalvo-Lezama, Gibran Fuentes-Pineda

In this paper, we study the transferability of ImageNet spatial and Kinetics spatio-temporal representations to multi-label Movie Trailer Genre Classification (MTGC). In particular, we present an extensive evaluation of the transferability of ConvNet and Transformer models pretrained on ImageNet and Kinetics to Trailers12k, a new manually-curated movie trailer dataset composed of 12,000 videos labeled with 10 different genres and associated metadata. We analyze different aspects that can influence transferability, such as frame rate, input video extension, and spatio-temporal modeling. In order to reduce the spatio-temporal structure gap between ImageNet/Kinetics and Trailers12k, we propose Dual Image and Video Transformer Architecture (DIViTA), which performs shot detection so as to segment the trailer into highly correlated clips, providing a more cohesive input for pretrained backbones and improving transferability (a 1.83% increase for ImageNet and 3.75% for Kinetics). Our results demonstrate that representations learned on either ImageNet or Kinetics are comparatively transferable to Trailers12k. Moreover, both datasets provide complementary information that can be combined to improve classification performance (a 2.91% gain compared to the top single pretraining). Interestingly, using lightweight ConvNets as pretrained backbones resulted in only a 3.46% drop in classification performance compared with the top Transformer while requiring only 11.82% of its parameters and 0.81% of its FLOPS.

CVJan 17, 2024
Efficient generative adversarial networks using linear additive-attention Transformers

Emilio Morales-Juarez, Gibran Fuentes-Pineda

Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present a novel GAN architecture which we call LadaGAN. This architecture is based on a linear attention Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions while being significantly more efficient. Moreover, LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models (e.g. DMs) using orders of magnitude less computational resources.

CVSep 29, 2025
MetaChest: Generalized few-shot learning of patologies from chest X-rays

Berenice Montalvo-Lezama, Gibran Fuentes-Pineda

The limited availability of annotated data presents a major challenge for applying deep learning methods to medical image analysis. Few-shot learning methods aim to recognize new classes from only a small number of labeled examples. These methods are typically studied under the standard few-shot learning setting, where all classes in a task are new. However, medical applications such as pathology classification from chest X-rays often require learning new classes while simultaneously leveraging knowledge of previously known ones, a scenario more closely aligned with generalized few-shot classification. Despite its practical relevance, few-shot learning has been scarcely studied in this context. In this work, we present MetaChest, a large-scale dataset of 479,215 chest X-rays collected from four public databases. MetaChest includes a meta-set partition specifically designed for standard few-shot classification, as well as an algorithm for generating multi-label episodes. We conduct extensive experiments evaluating both a standard transfer learning approach and an extension of ProtoNet across a wide range of few-shot multi-label classification tasks. Our results demonstrate that increasing the number of classes per episode and the number of training examples per class improves classification performance. Notably, the transfer learning approach consistently outperforms the ProtoNet extension, despite not being tailored for few-shot learning. We also show that higher-resolution images improve accuracy at the cost of additional computation, while efficient model architectures achieve comparable performance to larger models with significantly reduced resource requirements.

LGJan 6, 2021
Risk markers by sex for in-hospital mortality in patients with acute coronary syndrome: a machine learning approach

Blanca Vazquez, Gibran Fuentes-Pineda, Fabian Garcia et al.

Background: Several studies have highlighted the importance of considering sex differences in the diagnosis and treatment of Acute Coronary Syndrome (ACS). However, the identification of sex-specific risk markers in ACS sub-populations has been scarcely studied. The present study aims to explore machine learning (ML) models to identify in-hospital mortality markers for women and men in ACS sub-populations collected from a public database of electronic health records (EHR). Methods: We extracted 1,299 patients with ST-elevation myocardial infarction (STEMI) and 2,820 patients with non-ST-elevation myocardial infarction (NSTEMI) from the Medical Information Mart for Intensive Care (MIMIC)-III database. We trained and validated mortality prediction models and used an interpretability technique to identify sex-specific markers for each sub-population. Results: The models based on eXtreme Gradient Boosting (XGBoost) achieved the highest performance: area under the curve (AUC) = 0.94 (95\% CI:0.84-0.96) for STEMI and AUC = 0.94 (95\% CI:0.80-0.90) for NSTEMI. For STEMI, the top markers in women are chronic kidney failure, high heart rate, and age over 70 years. For men, the top markers are acute kidney failure, high troponin T levels, and age over 75 years. However, for NSTEMI, the top markers in women are low troponin levels, high urea levels, and age over 80 years. For men, the top markers are high heart rate, creatinine levels, and age over 70 years. Conclusions: Our results show possible significant and coherent sex-specific risk markers of different ACS sub-populations by interpreting ML mortality models trained on EHRs. Differences are observed in the identified risk markers between women and men, highlighting the importance of considering sex-specific markers in implementing more appropriate treatment strategies and better clinical outcomes.

ASMar 6, 2020
Lightweight Speaker Verification for Online Identification of New Speakers with Short Segments

Ivette Velez, Caleb Rascon, Gibran Fuentes-Pineda

Verifying if two audio segments belong to the same speaker has been recently put forward as a flexible way to carry out speaker identification, since it does not require to be re-trained when new speakers appear on the auditory scene. Although many of the current techniques have achieved high performances, they require a considerably high amount of memory, and a specific minimum length for their input audio segments. These requirements limit the applicability of these techniques in scenarios such as service robots, internet of things and virtual assistants, where computational resources are limited and the users tend to speak in short segments. In this work we propose a BLSTM-based model that reaches a level of performance comparable to the current state of the art when using short input audio segments, while requiring a considerably less amount of memory. Further, as far as we know, a complete speaker identification system has not been reported using this verification paradigm. Thus, we present a complete online speaker identifier, based on a simple voting system, that shows that the proposed BLSTM-based model achieves a similar performance at identifying speakers online compared to the current state of the art.

LGSep 16, 2019
A few filters are enough: Convolutional Neural Network for P300 Detection

Alicia Montserrat Alvarado-Gonzalez, Gibran Fuentes-Pineda, Jorge Cervantes-Ojeda

Over the past decade, convolutional neural networks (CNNs) have become the driving force of an ever-increasing set of applications, achieving state-of-the-art performance. Most of the modern CNN architectures are composed of many convolutional and fully connected layers and typically require thousands or millions of parameters to learn. CNNs have also been effective in the detection of Event-Related Potentials from electroencephalogram (EEG) signals, notably the P300 component which is frequently employed in Brain-Computer Interfaces (BCIs). However, for this task, the increase in detection rates compared to approaches based on human-engineered features has not been as impressive as in other areas and might not justify such a large number of parameters. In this paper, we study the performances of existing CNN architectures with diverse complexities for single-trial within-subject and cross-subject P300 detection on four different datasets. We also proposed SepConv1D, a very simple CNN architecture consisting of a single depthwise separable 1D convolutional layer followed by a fully connected Sigmoid classification neuron. We found that with as few as four filters in its convolutional layer and a small overall number of parameters, SepConv1D obtained competitive performances in the four datasets. We believe this may represent an important step towards building simpler, cheaper, faster, and more portable BCIs.

CLJul 3, 2018
Topic Discovery in Massive Text Corpora Based on Min-Hashing

Gibran Fuentes-Pineda, Ivan Vladimir Meza-Ruiz

The task of discovering topics in text corpora has been dominated by Latent Dirichlet Allocation and other Topic Models for over a decade. In order to apply these approaches to massive text corpora, the vocabulary needs to be reduced considerably and large computer clusters and/or GPUs are typically required. Moreover, the number of topics must be provided beforehand but this depends on the corpus characteristics and it is often difficult to estimate, especially for massive text corpora. Unfortunately, both topic quality and time complexity are sensitive to this choice. This paper describes an alternative approach to discover topics based on Min-Hashing, which can handle massive text corpora and large vocabularies using modest computer hardware and does not require to fix the number of topics in advance. The basic idea is to generate multiple random partitions of the corpus vocabulary to find sets of highly co-occurring words, which are then clustered to produce the final topics. In contrast to probabilistic topic models where topics are distributions over the complete vocabulary, the topics discovered by the proposed approach are sets of highly co-occurring words. Interestingly, these topics underlie various thematics with different levels of granularity. An extensive qualitative and quantitative evaluation using the 20 Newsgroups (18K), Reuters (800K), Spanish Wikipedia (1M), and English Wikipedia (5M) corpora shows that the proposed approach is able to consistently discover meaningful and coherent topics. Remarkably, the time complexity of the proposed approach is linear with respect to corpus and vocabulary size; a non-parallel implementation was able to discover topics from the entire English edition of Wikipedia with over 5 million documents and 1 million words in less than 7 hours.

CLJun 3, 2018
Contextualize, Show and Tell: A Neural Visual Storyteller

Diana Gonzalez-Rico, Gibran Fuentes-Pineda

We present a neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al. (Vinyals et al., 2015). This extension relies on an encoder LSTM to compute a context vector of each story from the image sequence. This context vector is used as the first state of multiple independent decoder LSTMs, each of which generates the portion of the story corresponding to each image in the sequence by taking the image embedding as the first input. Our model showed competitive results with the METEOR metric and human ratings in the internal track of the Visual Storytelling Challenge 2018.

LGSep 6, 2015
Sampled Weighted Min-Hashing for Large-Scale Topic Mining

Gibran Fuentes-Pineda, Ivan Vladimir Meza-Ruiz

We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to automatically mine topics from large-scale corpora. SWMH generates multiple random partitions of the corpus vocabulary based on term co-occurrence and agglomerates highly overlapping inter-partition cells to produce the mined topics. While other approaches define a topic as a probabilistic distribution over a vocabulary, SWMH topics are ordered subsets of such vocabulary. Interestingly, the topics mined by SWMH underlie themes from the corpus at different levels of granularity. We extensively evaluate the meaningfulness of the mined topics both qualitatively and quantitatively on the NIPS (1.7 K documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora. Additionally, we compare the quality of SWMH with Online LDA topics for document representation in classification.