Wei Chen

h-index21

3papers

45citations

Novelty35%

AI Score34

Ranked #109,711 of 194,257 authors (top 56%)#36,660 in CV (top 62%)

3 Papers

18.6CVFeb 5, 2024Code

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

Sheng Luo, Wei Chen, Wanxin Tian et al.

Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS

1.2SIMar 28, 2025Code

Correlation-Attention Masked Temporal Transformer for User Identity Linkage Using Heterogeneous Mobility Data

Ziang Yan, Xingyu Zhao, Hanqing Ma et al.

With the rise of social media and Location-Based Social Networks (LBSN), check-in data across platforms has become crucial for User Identity Linkage (UIL). These data not only reveal users' spatio-temporal information but also provide insights into their behavior patterns and interests. However, cross-platform identity linkage faces challenges like poor data quality, high sparsity, and noise interference, which hinder existing methods from extracting cross-platform user information. To address these issues, we propose a Correlation-Attention Masked Transformer for User Identity Linkage Network (MT-Link), a transformer-based framework to enhance model performance by learning spatio-temporal co-occurrence patterns of cross-platform users. Our model effectively captures spatio-temporal co-occurrence in cross-platform user check-in sequences. It employs a correlation attention mechanism to detect the spatio-temporal co-occurrence between user check-in sequences. Guided by attention weight maps, the model focuses on co-occurrence points while filtering out noise, ultimately improving classification performance. Experimental results show that our model significantly outperforms state-of-the-art baselines by 12.92%~17.76% and 5.80%~8.38% improvements in terms of Macro-F1 and Area Under Curve (AUC).

4.1LGMay 28, 2025

GUST: Quantifying Free-Form Geometric Uncertainty of Metamaterials Using Small Data

Jiahui Zheng, Cole Jahnke, Wei "Wayne" Chen

This paper introduces GUST (Generative Uncertainty learning via Self-supervised pretraining and Transfer learning), a framework for quantifying free-form geometric uncertainties inherent in the manufacturing of metamaterials. GUST leverages the representational power of deep generative models to learn a high-dimensional conditional distribution of as-fabricated unit cell geometries given nominal designs, thereby enabling uncertainty quantification. To address the scarcity of real-world manufacturing data, GUST employs a two-stage learning process. First, it leverages self-supervised pretraining on a large-scale synthetic dataset to capture the structure variability inherent in metamaterial geometries and an approximated distribution of as-fabricated geometries given nominal designs. Subsequently, GUST employs transfer learning by fine-tuning the pretrained model on limited real-world manufacturing data, allowing it to adapt to specific manufacturing processes and nominal designs. With only 960 unit cells additively manufactured in only two passes, GUST can capture the variability in geometry and effective material properties. In contrast, directly training a generative model on the same amount of real-world data proves insufficient, as demonstrated through both qualitative and quantitative comparisons. This scalable and cost-effective approach significantly reduces data requirements while maintaining the effectiveness in learning complex, real-world geometric uncertainties, offering an affordable method for free-form geometric uncertainty quantification in the manufacturing of metamaterials. The capabilities of GUST hold significant promise for high-precision industries such as aerospace and biomedical engineering, where understanding and mitigating manufacturing uncertainties are critical.