Enric Meinhardt-Llopis

CV
h-index15
8papers
26citations
Novelty44%
AI Score44

8 Papers

32.8LGMay 26
SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings

Léo Nicollier, Max Dunitz, Marc Pic et al.

A fundamental open question in self-supervised learning (SSL) is the explicit characterization of the optimal geometry of the learned representations. Recently, LeJEPA identified isotropic Gaussian embeddings as optimal for minimizing downstream prediction risk in Euclidean spaces. However, the corresponding problem for distributions supported on lower-dimensional manifolds, such as the hypersphere, remains unexplored. In this work, we demonstrate that extending this minimax analysis to smooth distributions on Riemannian manifolds fundamentally changes the optimal solution. We show that, under a worst-case formulation, both k-nearest neighbors and kernel ridge regression induce hyperspherical uniformity. More precisely, we show that uniform distributions on manifolds are optimal for k-nearest neighbors, and that the uniform distribution on the sphere is optimal for kernel ridge regression with both the exponential dot-product kernel and the linear kernel. This theoretical insight reveals a fundamental limitation of Gaussian embeddings: their non-uniform density induces anisotropic k-NN neighborhoods, severely biasing the estimator. To correct this, we introduce SPHERE-JEPA, a theoretically grounded SSL framework. We adapt LeJEPA's Cram{é}r-Wold projection mechanism to enforce hyperspherical uniformity rather than a Gaussian prior. Empirically, SPHERE-JEPA yields significant improvements, boosting texture retrieval mAP by over 6%, while consistently matching or outperforming LeJEPA on standard benchmarks-including a +1.8% linear probing gain on ImageNet-1K (ViT-B/14).

CVFeb 17
An Industrial Dataset for Scene Acquisitions and Functional Schematics Alignment

Flavien Armangeon, Thibaud Ehret, Enric Meinhardt-Llopis et al.

Aligning functional schematics with 2D and 3D scene acquisitions is crucial for building digital twins, especially for old industrial facilities that lack native digital models. Current manual alignment using images and LiDAR data does not scale due to tediousness and complexity of industrial sites. Inconsistencies between schematics and reality, and the scarcity of public industrial datasets, make the problem both challenging and underexplored. This paper introduces IRIS-v2, a comprehensive dataset to support further research. It includes images, point clouds, 2D annotated boxes and segmentation masks, a CAD model, 3D pipe routing information, and the P&ID (Piping and Instrumentation Diagram). The alignment is experimented on a practical case study, aiming at reducing the time required for this task by combining segmentation and graph matching.

CVDec 2, 2025
Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle et al.

Image-based localization in GNSS-denied environments is critical for UAV autonomy. Existing state-of-the-art approaches rely on matching UAV images to geo-referenced satellite images; however, they typically require large-scale, paired UAV-satellite datasets for training. Such data are costly to acquire and often unavailable, limiting their applicability. To address this challenge, we adopt a training paradigm that removes the need for UAV imagery during training by learning directly from satellite-view reference images. This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real-world UAV views. We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real-world UAV images that we release to the community. Our method achieves competitive performance compared to approaches trained with paired data, demonstrating its effectiveness and strong generalization capabilities.

CVApr 9, 2025
S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications

Elías Masquil, Roger Marí, Thibaud Ehret et al.

We introduce the S-EO dataset: a large-scale, high-resolution dataset, designed to advance geometry-aware shadow detection. Collected from diverse public-domain sources, including challenge datasets and government providers such as USGS, our dataset comprises 702 georeferenced tiles across the USA, each covering 500x500 m. Each tile includes multi-date, multi-angle WorldView-3 pansharpened RGB images, panchromatic images, and a ground-truth DSM of the area obtained from LiDAR scans. For each image, we provide a shadow mask derived from geometry and sun position, a vegetation mask based on the NDVI index, and a bundle-adjusted RPC model. With approximately 20,000 images, the S-EO dataset establishes a new public resource for shadow detection in remote sensing imagery and its applications to 3D reconstruction. To demonstrate the dataset's impact, we train and evaluate a shadow detector, showcasing its ability to generalize, even to aerial images. Finally, we extend EO-NeRF - a state-of-the-art NeRF approach for satellite imagery - to leverage our shadow predictions for improved 3D reconstructions.

CVApr 9, 2024
Leveraging edge detection and neural networks for better UAV localization

Theo Di Piazza, Enric Meinhardt-Llopis, Gabriele Facciolo et al.

We propose a novel method for geolocalizing Unmanned Aerial Vehicles (UAVs) in environments lacking Global Navigation Satellite Systems (GNSS). Current state-of-the-art techniques employ an offline-trained encoder to generate a vector representation (embedding) of the UAV's current view, which is then compared with pre-computed embeddings of geo-referenced images to determine the UAV's position. Here, we demonstrate that the performance of these methods can be significantly enhanced by preprocessing the images to extract their edges, which exhibit robustness to seasonal and illumination variations. Furthermore, we establish that utilizing edges enhances resilience to orientation and altitude inaccuracies. Additionally, we introduce a confidence criterion for localization. Our findings are substantiated through synthetic experiments.

CVMar 1, 2021
Automatic Stockpile Volume Monitoring using Multi-view Stereo from SkySat Imagery

Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis et al.

This paper proposes a system for automatic surface volume monitoring from time series of SkySat pushframe imagery. A specific challenge of building and comparing large 3D models from SkySat data is to correct inconsistencies between the camera models associated to the multiple views that are necessary to cover the area at a given time, where these camera models are represented as Rational Polynomial Cameras (RPCs). We address the problem by proposing a date-wise RPC refinement, able to handle dynamic areas covered by sets of partially overlapping views. The cameras are refined by means of a rotation that compensates for errors due to inaccurate knowledge of the satellite attitude. The refined RPCs are then used to reconstruct multiple consistent Digital Surface Models (DSMs) from different stereo pairs at each date. RPC refinement strengthens the consistency between the DSMs of each date, which is extremely beneficial to accurately measure volumes in the 3D surface models. The system is tested in a real case scenario, to monitor large coal stockpiles. Our volume estimates are validated with measurements collected on site in the same period of time.

CVJan 19, 2017
Accurate Motion Estimation through Random Sample Aggregated Consensus

Martin Rais, Gabriele Facciolo, Enric Meinhardt-Llopis et al.

We reconsider the classic problem of estimating accurately a 2D transformation from point matches between images containing outliers. RANSAC discriminates outliers by randomly generating minimalistic sampled hypotheses and verifying their consensus over the input data. Its response is based on the single hypothesis that obtained the largest inlier support. In this article we show that the resulting accuracy can be improved by aggregating all generated hypotheses. This yields RANSAAC, a framework that improves systematically over RANSAC and its state-of-the-art variants by statistically aggregating hypotheses. To this end, we introduce a simple strategy that allows to rapidly average 2D transformations, leading to an almost negligible extra computational cost. We give practical applications on projective transforms and homography+distortion models and demonstrate a significant performance gain in both cases.

CVFeb 29, 2016
FALDOI: A new minimization strategy for large displacement variational optical flow

Roberto P. Palomares, Enric Meinhardt-Llopis, Coloma Ballester et al.

We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional. The method requires a given set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from those matches. In particular, the matches are used to guide a structured coordinate-descent of the energy functional around these keypoints. It results in a two-step minimization method at the finest scale which is very robust to the inevitable outliers of the sparse matcher and able to capture large displacements of small objects. Its benefits over other variational methods that also rely on a set of sparse matches are its robustness against very few matches, high levels of noise and outliers. We validate our proposal using several optical flow variational models. The results consistently outperform the coarse-to-fine approaches and achieve good qualitative and quantitative performance on the standard optical flow benchmarks.