CVNov 4, 2025

ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology

arXiv:2511.02946v1h-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses multimodal learning challenges in ecology, offering a novel approach for any-to-any generation and fusion analysis, but it appears incremental as it builds on masked reconstruction techniques.

The paper tackles the problem of generating multimodal representations for ecology by introducing ProM3E, a model that learns to infer missing modalities through masked reconstruction, achieving superior performance in cross-modal retrieval and linear probing tasks.

We introduce ProM3E, a probabilistic masked multimodal embedding model for any-to-any generation of multimodal representations for ecology. ProM3E is based on masked modality reconstruction in the embedding space, learning to infer missing modalities given a few context modalities. By design, our model supports modality inversion in the embedding space. The probabilistic nature of our model allows us to analyse the feasibility of fusing various modalities for given downstream tasks, essentially learning what to fuse. Using these features of our model, we propose a novel cross-modal retrieval approach that mixes inter-modal and intra-modal similarities to achieve superior performance across all retrieval tasks. We further leverage the hidden representation from our model to perform linear probing tasks and demonstrate the superior representation learning capability of our model. All our code, datasets and model will be released at https://vishu26.github.io/prom3e.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes