AINov 13, 2025
MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum FusionHaolong Xiang, Peisi Wang, Xiaolong Xu et al.
With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic signal modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives. The three branches drive for a multimodal perspective of urban traffic signal learning in the frequency domain, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic signals, which transforms the original modality into frequency images and periodicity images for visual learning. Also, we augment descriptive texts for the traffic signals based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the spectrum of three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.
CVMar 13, 2024
AFGI: Towards Accurate and Fast-convergent Gradient Inversion Attack in Federated LearningCan Liu, Jin Wang, and Yipeng Zhou et al.
Federated learning (FL) empowers privacypreservation in model training by only exposing users' model gradients. Yet, FL users are susceptible to gradient inversion attacks (GIAs) which can reconstruct ground-truth training data such as images based on model gradients. However, reconstructing high-resolution images by existing GIAs faces two challenges: inferior accuracy and slow-convergence, especially when duplicating labels exist in the training batch. To address these challenges, we present an Accurate and Fast-convergent Gradient Inversion attack algorithm, called AFGI, with two components: Label Recovery Block (LRB) which can accurately restore duplicating labels of private images based on exposed gradients; VME Regularization Term, which includes the total variance of reconstructed images, the discrepancy between three-channel means and edges, between values from exposed gradients and reconstructed images, respectively. The AFGI can be regarded as a white-box attack strategy to reconstruct images by leveraging labels recovered by LRB. In particular, AFGI is efficient that accurately reconstruct ground-truth images when users' training batch size is up to 48. Our experimental results manifest that AFGI can diminish 85% time costs while achieving superb inversion quality in the ImageNet dataset. At last, our study unveils the shortcomings of FL in privacy-preservation, prompting the development of more advanced countermeasure strategies.
CVMay 25, 2023
CUEING: a lightweight model to Capture hUman attEntion In driviNGLinfeng Liang, Yao Deng, Yang Zhang et al.
Discrepancies in decision-making between Autonomous Driving Systems (ADS) and human drivers underscore the need for intuitive human gaze predictors to bridge this gap, thereby improving user trust and experience. Existing gaze datasets, despite their value, suffer from noise that hampers effective training. Furthermore, current gaze prediction models exhibit inconsistency across diverse scenarios and demand substantial computational resources, restricting their on-board deployment in autonomous vehicles. We propose a novel adaptive cleansing technique for purging noise from existing gaze datasets, coupled with a robust, lightweight convolutional self-attention gaze prediction model. Our approach not only significantly enhances model generalizability and performance by up to 12.13% but also ensures a remarkable reduction in model complexity by up to 98.2% compared to the state-of-the art, making in-vehicle deployment feasible to augment ADS decision visualization and performance.
IVMay 8, 2020
Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Images for SegmentationShuchao Pang, Anan Du, Mehmet A. Orgun et al.
Automatic tumor segmentation is a crucial step in medical image analysis for computer-aided diagnosis. Although the existing methods based on convolutional neural networks (CNNs) have achieved the state-of-the-art performance, many challenges still remain in medical tumor segmentation. This is because regular CNNs can only exploit translation invariance, ignoring further inherent symmetries existing in medical images such as rotations and reflections. To mitigate this shortcoming, we propose a novel group equivariant segmentation framework by encoding those inherent symmetries for learning more precise representations. First, kernel-based equivariant operations are devised on every orientation, which can effectively address the gaps of learning symmetries in existing approaches. Then, to keep segmentation networks globally equivariant, we design distinctive group layers with layerwise symmetry constraints. By exploiting further symmetries, novel segmentation CNNs can dramatically reduce the sample complexity and the redundancy of filters (by roughly 2/3) over regular CNNs. More importantly, based on our novel framework, we show that a newly built GER-UNet outperforms its regular CNN-based counterpart and the state-of-the-art segmentation methods on real-world clinical data. Specifically, the group layers of our segmentation framework can be seamlessly integrated into any popular CNN-based segmentation architectures.