Juan Terven

h-index17

4papers

2,773citations

Novelty10%

AI Score25

Ranked #164,072 of 194,257 authors (top 84%)#52,812 in CV (top 89%)

4 Papers

34.3CVApr 2, 2023

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

Juan Terven, Diana Cordova-Esparza

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

20.4LGJul 5, 2023

Loss Functions and Metrics in Deep Learning

Juan Terven, Diana M. Cordova-Esparza, Alfonso Ramirez-Pedraza et al.

This paper presents a comprehensive review of loss functions and performance metrics in deep learning, highlighting key developments and practical insights across diverse application areas. We begin by outlining fundamental considerations in classic tasks such as regression and classification, then extend our analysis to specialized domains like computer vision and natural language processing including retrieval-augmented generation. In each setting, we systematically examine how different loss functions and evaluation metrics can be paired to address task-specific challenges such as class imbalance, outliers, and sequence-level optimization. Key contributions of this work include: (1) a unified framework for understanding how losses and metrics align with different learning objectives, (2) an in-depth discussion of multi-loss setups that balance competing goals, and (3) new insights into specialized metrics used to evaluate modern applications like retrieval-augmented generation, where faithfulness and context relevance are pivotal. Along the way, we highlight best practices for selecting or combining losses and metrics based on empirical behaviors and domain constraints. Finally, we identify open problems and promising directions, including the automation of loss-function search and the development of robust, interpretable evaluation measures for increasingly complex deep learning tasks. Our review aims to equip researchers and practitioners with clearer guidance in designing effective training pipelines and reliable model assessments for a wide spectrum of real-world applications.

2.0CVJan 17, 2024Code

Land Cover Image Classification

Antonio Rangel, Juan Terven, Diana M. Cordova-Esparza et al.

Land Cover (LC) image classification has become increasingly significant in understanding environmental changes, urban planning, and disaster management. However, traditional LC methods are often labor-intensive and prone to human error. This paper explores state-of-the-art deep learning models for enhanced accuracy and efficiency in LC analysis. We compare convolutional neural networks (CNN) against transformer-based methods, showcasing their applications and advantages in LC studies. We used EuroSAT, a patch-based LC classification data set based on Sentinel-2 satellite images and achieved state-of-the-art results using current transformer models.

0.9CVApr 6, 2018Code

Telepresence System based on Simulated Holographic Display

Diana-Margarita Córdova-Esparza, Juan Terven, Hugo Jiménez-Hernández et al.

We present a telepresence system based on a custom-made simulated holographic display that produces a full 3D model of the remote participants using commodity depth sensors. Our display is composed of a video projector and a quadrangular pyramid made of acrylic, that allows the user to experience an omnidirectional visualization of a remote person without the need for head-mounted displays. To obtain a precise representation of the participants, we fuse together multiple views extracted using a deep background subtraction method. Our system represents an attempt to democratize high-fidelity 3D telepresence using off-the-shelf components.