CVDec 22, 2023

A Language-based solution to enable Metaverse Retrieval

arXiv:2312.14630v19 citationsh-index: 7MMM
Originality Synthesis-oriented
AI Analysis

This addresses the lack of search engines for Metaverse discovery, which is a problem for users seeking relevant virtual worlds, but it is incremental as it adapts existing retrieval methods to a new domain.

The paper tackles the problem of searching for Metaverse virtual worlds by introducing a Text-to-Metaverse retrieval task, using language descriptions to match user interests, and reports results on a newly collected dataset of 33,000 Metaverses with a deep learning framework.

Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do users find the one Metaverse which best fits their current interests? So far, the search process is mostly done by word of mouth, or by advertisement on technology-oriented websites. However, the lack of search engines similar to those available for other multimedia formats (e.g., YouTube for videos) is showing its limitations, since it is often cumbersome to find a Metaverse based on some specific interests using the available methods, while also making it difficult to discover user-created ones which lack strong advertisement. To address this limitation, we propose to use language to naturally describe the desired contents of the Metaverse a user wishes to find. Second, we highlight that, differently from more conventional 3D scenes, Metaverse scenarios represent a more complex data format since they often contain one or more types of multimedia which influence the relevance of the scenario itself to a user query. Therefore, in this work, we create a novel task, called Text-to-Metaverse retrieval, which aims at modeling these aspects while also taking the cross-modal relations with the textual data into account. Since we are the first ones to tackle this problem, we also collect a dataset of 33000 Metaverses, each of which consists of a 3D scene enriched with multimedia content. Finally, we design and implement a deep learning framework based on contrastive learning, resulting in a thorough experimental setup.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes