CVApr 25, 2022Code
A Simple Structure For Building A Robust ModelXiao Tan, Jingbo Gao, Ruolin Li
As deep learning applications, especially programs of computer vision, are increasingly deployed in our lives, we have to think more urgently about the security of these applications.One effective way to improve the security of deep learning models is to perform adversarial training, which allows the model to be compatible with samples that are deliberately created for use in attacking the model.Based on this, we propose a simple architecture to build a model with a certain degree of robustness, which improves the robustness of the trained network by adding an adversarial sample detection network for cooperative training. At the same time, we design a new data sampling strategy that incorporates multiple existing attacks, allowing the model to adapt to many different adversarial attacks with a single training.We conducted some experiments to test the effectiveness of this design based on Cifar10 dataset, and the results indicate that it has some degree of positive effect on the robustness of the model.Our code could be found at https://github.com/dowdyboy/simple_structure_for_robust_model .
CVNov 7, 2025
A Dual-stage Prompt-driven Privacy-preserving Paradigm for Person Re-IdentificationRuolin Li, Min Liu, Yuan Bian et al.
With growing concerns over data privacy, researchers have started using virtual data as an alternative to sensitive real-world images for training person re-identification (Re-ID) models. However, existing virtual datasets produced by game engines still face challenges such as complex construction and poor domain generalization, making them difficult to apply in real scenarios. To address these challenges, we propose a Dual-stage Prompt-driven Privacy-preserving Paradigm (DPPP). In the first stage, we generate rich prompts incorporating multi-dimensional attributes such as pedestrian appearance, illumination, and viewpoint that drive the diffusion model to synthesize diverse data end-to-end, building a large-scale virtual dataset named GenePerson with 130,519 images of 6,641 identities. In the second stage, we propose a Prompt-driven Disentanglement Mechanism (PDM) to learn domain-invariant generalization features. With the aid of contrastive learning, we employ two textual inversion networks to map images into pseudo-words representing style and content, respectively, thereby constructing style-disentangled content prompts to guide the model in learning domain-invariant content features at the image level. Experiments demonstrate that models trained on GenePerson with PDM achieve state-of-the-art generalization performance, surpassing those on popular real and virtual Re-ID datasets.
CVJul 10, 2024
Micro-Expression Recognition by Motion Feature Extraction based on Pre-trainingRuolin Li, Lu Wang, Tingting Yang et al.
Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several challenges, e.g. subtle motion and limited training data. To address these problems, we propose a novel motion extraction strategy (MoExt) for the MER task and use additional macro-expression data in the pre-training process. We primarily pretrain the feature separator and motion extractor using the contrastive loss, thus enabling them to extract representative motion features. In MoExt, shape features and texture features are first extracted separately from onset and apex frames, and then motion features related to MEs are extracted based on the shape features of both frames. To enable the model to more effectively separate features, we utilize the extracted motion features and the texture features from the onset frame to reconstruct the apex frame. Through pre-training, the module is enabled to extract inter-frame motion features of facial expressions while excluding irrelevant information. The feature separator and motion extractor are ultimately integrated into the MER network, which is then fine-tuned using the target ME data. The effectiveness of proposed method is validated on three commonly used datasets, i.e., CASME II, SMIC, SAMM, and CAS(ME)3 dataset. The results show that our method performs favorably against state-of-the-art methods.
LGMay 3, 2024
Large Language Models for Mobility Analysis in Transportation Systems: A Survey on Forecasting TasksZijian Zhang, Yujie Sun, Zepu Wang et al.
Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban planning. Machine learning and deep learning methods are favored for their flexibility and accuracy. Nowadays, with the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors. However, there is a lack of comprehensive studies on how LLMs can contribute to this field. This survey explores existing approaches using LLMs for time series forecasting problems for mobility in transportation systems. We provide a literature review concerning the forecasting applications within transportation systems, elucidating how researchers utilize LLMs, showcasing recent state-of-the-art advancements, and identifying the challenges that must be overcome to fully leverage LLMs in this domain.
LGJan 2, 2025
Graph2text or Graph2token: A Perspective of Large Language Models for Graph LearningShuo Yu, Yingbo Wang, Ruolin Li et al.
Graphs are data structures used to represent irregular networks and are prevalent in numerous real-world applications. Previous methods directly model graph structures and achieve significant success. However, these methods encounter bottlenecks due to the inherent irregularity of graphs. An innovative solution is converting graphs into textual representations, thereby harnessing the powerful capabilities of Large Language Models (LLMs) to process and comprehend graphs. In this paper, we present a comprehensive review of methodologies for applying LLMs to graphs, termed LLM4graph. The core of LLM4graph lies in transforming graphs into texts for LLMs to understand and analyze. Thus, we propose a novel taxonomy of LLM4graph methods in the view of the transformation. Specifically, existing methods can be divided into two paradigms: Graph2text and Graph2token, which transform graphs into texts or tokens as the input of LLMs, respectively. We point out four challenges during the transformation to systematically present existing methods in a problem-oriented perspective. For practical concerns, we provide a guideline for researchers on selecting appropriate models and LLMs for different graphs and hardware constraints. We also identify five future research directions for LLM4graph.
SYApr 12
When Altruism Meets Autonomy: Managing Bottleneck Congestion with Strategic Autonomous VehiclesKexin Wang, Haohui He, Ruolin Li
Weaving ramps are critical bottlenecks in highway networks due to conflicting traffic flows and complex interactions among heterogeneous vehicle types. In mixed-autonomy settings, the presence of controllable autonomous vehicles (AVs) introduces new opportunities to influence system-level outcomes, yet the structural impact of such control remains poorly understood. This paper develops a unified equilibrium framework to capture, predict, and optimize aggregate lane-choice behavior in weaving ramps with heterogeneous vehicle populations. We first formulate a Wardrop-based model capturing the selfish behavior of human-driven vehicles (HDVs) and establish existence, uniqueness, and validity of the resulting equilibrium. We then introduce a Stackelberg--Wardrop formulation in which AVs act as strategic leaders optimizing system performance, while HDVs respond through equilibrium adaptation. The framework is further generalized to incorporate heterogeneous behavioral preferences of HDVs and AVs via a Social Value Orientation (SVO) model. Our analysis reveals a fundamental structural property of mixed-autonomy traffic systems: under selfish HDV behavior, the impact of AV penetration is inherently non-increasing, exhibiting plateau regions where performance remains unchanged and improves only at critical thresholds. These results provide principled guidance for the design of AV control and incentive mechanisms in the presence of selfish human behavior, and demonstrate how strategically controlled autonomous agents can be deployed to induce system-level efficiency gains in mixed-autonomy transportation networks.