Yilun Lin

h-index19

13papers

147citations

Novelty47%

AI Score37

Ranked #91,983 of 194,257 authors (top 47%)#5,657 in AI (top 45%)

13 Papers

2.3SYJul 25, 2023

Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization

Shengyue Yao, Jingru Yu, Yi Yu et al.

With a growing complexity of the intelligent traffic system (ITS), an integrated control of ITS that is capable of considering plentiful heterogeneous intelligent agents is desired. However, existing control methods based on the centralized or the decentralized scheme have not presented their competencies in considering the optimality and the scalability simultaneously. To address this issue, we propose an integrated control method based on the framework of Decentralized Autonomous Organization (DAO). The proposed method achieves a global consensus on energy consumption efficiency (ECE), meanwhile to optimize the local objectives of all involved intelligent agents, through a consensus and incentive mechanism. Furthermore, an operation algorithm is proposed regarding the issue of structural rigidity in DAO. Specifically, the proposed operation approach identifies critical agents to execute the smart contract in DAO, which ultimately extends the capability of DAO-based control. In addition, a numerical experiment is designed to examine the performance of the proposed method. The experiment results indicate that the controlled agents can achieve a consensus faster on the global objective with improved local objectives by the proposed method, compare to existing decentralized control methods. In general, the proposed method shows a great potential in developing an integrated control system in the ITS

3.1SEJul 13, 2023

IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

Wei Hu, Xuhong Wang, Ding Wang et al.

In the realm of software applications in the transportation industry, Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their ease of use and various other benefits. With the ceaseless progress in computer performance and the rapid development of large-scale models, the possibility of programming using natural language in specified applications - referred to as Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits greater flexibility and freedom, which, in turn, leads to an increase in computational complexity for parsing and a decrease in processing performance. To tackle this issue, our paper advances a design for an intermediate representation (IR) that caters to ASNL and can uniformly process transportation data into graph data format, improving data processing performance. Experimental comparisons reveal that in standard data query operations, our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data.

10.7CRJul 15, 2024

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

Xuhong Wang, Haoyu Jiang, Yi Yu et al.

Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensure data security. However, previous studies have primarily concentrated on the basic principles of algorithms and lacked a comprehensive analysis of watermarking theory and practice from the perspective of intelligent identification. To bridge this gap, firstly, we explore how a robust identity recognition system can be effectively implemented and managed within LLMs by various participants using watermarking technology. Secondly, we propose a mathematical framework based on mutual information theory, which systematizes the identification process to achieve more precise and customized watermarking. Additionally, we present a comprehensive evaluation of performance metrics for LLM watermarking, reflecting participant preferences and advancing discussions on its identification applications. Lastly, we outline the existing challenges in current watermarking technologies and theoretical frameworks, and provide directional guidance to address these challenges. Our systematic classification and detailed exposition aim to enhance the comparison and evaluation of various methods, fostering further research and development toward a transparent, secure, and equitable LLM ecosystem.

8.5AIDec 4, 2024Code

CredID: Credible Multi-Bit Watermark for Large Language Models Identification

Haoyu Jiang, Xuhong Wang, Ping Yi et al.

Large Language Models (LLMs) are widely used in complex natural language processing tasks but raise privacy and security concerns due to the lack of identity recognition. This paper proposes a multi-party credible watermarking framework (CredID) involving a trusted third party (TTP) and multiple LLM vendors to address these issues. In the watermark embedding stage, vendors request a seed from the TTP to generate watermarked text without sending the user's prompt. In the extraction stage, the TTP coordinates each vendor to extract and verify the watermark from the text. This provides a credible watermarking scheme while preserving vendor privacy. Furthermore, current watermarking algorithms struggle with text quality, information capacity, and robustness, making it challenging to meet the diverse identification needs of LLMs. Thus, we propose a novel multi-bit watermarking algorithm and an open-source toolkit to facilitate research. Experiments show our CredID enhances watermark credibility and efficiency without compromising text quality. Additionally, we successfully utilized this framework to achieve highly accurate identification among multiple LLM vendors.

1.2CYNov 6, 2023

Evolutionary City: Towards a Flexible, Agile and Symbiotic System

Xi Chen, Wei Hu, Jingru Yu et al.

Urban growth sometimes leads to rigid infrastructure that struggles to adapt to changing demand. This paper introduces a novel approach, aiming to enable cities to evolve and respond more effectively to such dynamic demand. It identifies the limitations arising from the complexity and inflexibility of existing urban systems. A framework is presented for enhancing the city's adaptability perception through advanced sensing technologies, conducting parallel simulation via graph-based techniques, and facilitating autonomous decision-making across domains through decentralized and autonomous organization and operation. Notably, a symbiotic mechanism is employed to implement these technologies practically, thereby making urban management more agile and responsive. In the case study, we explore how this approach can optimize traffic flow by adjusting lane allocations. This case not only enhances traffic efficiency but also reduces emissions. The proposed evolutionary city offers a new perspective on sustainable urban development, highliting the importance of integrated intelligence within urban systems.

2.0CVDec 14, 2024Code

Memory Efficient Matting with Adaptive Token Routing

Yiheng Lin, Yihan Hu, Chenyi Zhang et al.

Transformer-based models have recently achieved outstanding performance in image matting. However, their application to high-resolution images remains challenging due to the quadratic complexity of global self-attention. To address this issue, we propose MEMatte, a \textbf{m}emory-\textbf{e}fficient \textbf{m}atting framework for processing high-resolution images. MEMatte incorporates a router before each global attention block, directing informative tokens to the global attention while routing other tokens to a Lightweight Token Refinement Module (LTRM). Specifically, the router employs a local-global strategy to predict the routing probability of each token, and the LTRM utilizes efficient modules to simulate global attention. Additionally, we introduce a Batch-constrained Adaptive Token Routing (BATR) mechanism, which allows each router to dynamically route tokens based on image content and the stages of attention block in the network. Furthermore, we construct an ultra high-resolution image matting dataset, UHR-395, comprising 35,500 training images and 1,000 test images, with an average resolution of $4872\times6017$. This dataset is created by compositing 395 different alpha mattes across 11 categories onto various backgrounds, all with high-quality manual annotation. Extensive experiments demonstrate that MEMatte outperforms existing methods on both high-resolution and real-world datasets, significantly reducing memory usage by approximately 88% and latency by 50% on the Composition-1K benchmark. Our code is available at https://github.com/linyiheng123/MEMatte.

6.2CVMay 28, 2025

AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment

Yiheng Lin, Shifang Zhao, Ting Liu et al.

Personalized image generation aims to integrate user-provided concepts into text-to-image models, enabling the generation of customized content based on a given prompt. Recent zero-shot approaches, particularly those leveraging diffusion transformers, incorporate reference image information through multi-modal attention mechanism. This integration allows the generated output to be influenced by both the textual prior from the prompt and the visual prior from the reference image. However, we observe that when the prompt and reference image are misaligned, the generated results exhibit a stronger bias toward the textual prior, leading to a significant loss of reference content. To address this issue, we propose AlignGen, a Cross-Modality Prior Alignment mechanism that enhances personalized image generation by: 1) introducing a learnable token to bridge the gap between the textual and visual priors, 2) incorporating a robust training strategy to ensure proper prior alignment, and 3) employing a selective cross-modal attention mask within the multi-modal attention mechanism to further align the priors. Experimental results demonstrate that AlignGen outperforms existing zero-shot methods and even surpasses popular test-time optimization approaches.

3.6CVJan 16, 2025Code

SVIA: A Street View Image Anonymization Framework for Self-Driving Applications

Dongyu Liu, Xuhong Wang, Cen Chen et al.

In recent years, there has been an increasing interest in image anonymization, particularly focusing on the de-identification of faces and individuals. However, for self-driving applications, merely de-identifying faces and individuals might not provide sufficient privacy protection since street views like vehicles and buildings can still disclose locations, trajectories, and other sensitive information. Therefore, it remains crucial to extend anonymization techniques to street view images to fully preserve the privacy of users, pedestrians, and vehicles. In this paper, we propose a Street View Image Anonymization (SVIA) framework for self-driving applications. The SVIA framework consists of three integral components: a semantic segmenter to segment an input image into functional regions, an inpainter to generate alternatives to privacy-sensitive regions, and a harmonizer to seamlessly stitch modified regions to guarantee visual coherence. Compared to existing methods, SVIA achieves a much better trade-off between image generation quality and privacy protection, as evidenced by experimental results for five common metrics on two widely used public datasets.

3.3AIJun 27, 2025

Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios

Shengyue Yao, Runqing Guo, Yangyang Qin et al.

With the deep penetration of Artificial Intelligence (AI) in the transportation sector, intelligent cockpits, autonomous driving, and intelligent road networks are developing at an unprecedented pace. However, the data ecosystems of these three key areas are increasingly fragmented and incompatible. Especially, existing testing methods rely on data stacking, fail to cover all edge cases, and lack flexibility. To address this issue, this paper introduces the concept of "Query as Test" (QaT). This concept shifts the focus from rigid, prescripted test cases to flexible, on-demand logical queries against a unified data representation. Specifically, we identify the need for a fundamental improvement in data storage and representation, leading to our proposal of "Extensible Scenarios Notations" (ESN). ESN is a novel declarative data framework based on Answer Set Programming (ASP), which uniformly represents heterogeneous multimodal data from the cockpit, vehicle, and road as a collection of logical facts and rules. This approach not only achieves deep semantic fusion of data, but also brings three core advantages: (1) supports complex and flexible semantic querying through logical reasoning; (2) provides natural interpretability for decision-making processes; (3) allows for on-demand data abstraction through logical rules, enabling fine-grained privacy protection. We further elaborate on the QaT paradigm, transforming the functional validation and safety compliance checks of autonomous driving systems into logical queries against the ESN database, significantly enhancing the expressiveness and formal rigor of the testing. Finally, we introduce the concept of "Validation-Driven Development" (VDD), which suggests to guide developments by logical validation rather than quantitative testing in the era of Large Language Models, in order to accelerating the iteration and development process.

5.8LGNov 23, 2020

Improving Federated Relational Data Modeling via Basis Alignment and Weight Penalty

Yilun Lin, Chaochao Chen, Cen Chen et al.

Federated learning (FL) has attracted increasing attention in recent years. As a privacy-preserving collaborative learning paradigm, it enables a broader range of applications, especially for computer vision and natural language processing tasks. However, to date, there is limited research of federated learning on relational data, namely Knowledge Graph (KG). In this work, we present a modified version of the graph neural network algorithm that performs federated modeling over KGs across different participants. Specifically, to tackle the inherent data heterogeneity issue and inefficiency in algorithm convergence, we propose a novel optimization algorithm, named FedAlign, with 1) optimal transportation (OT) for on-client personalization and 2) weight constraint to speed up the convergence. Extensive experiments have been conducted on several widely used datasets. Empirical results show that our proposed method outperforms the state-of-the-art FL methods, such as FedAVG and FedProx, with better convergence.

4.1ROApr 25, 2020

GPO: Global Plane Optimization for Fast and Accurate Monocular SLAM Initialization

Sicong Du, Hengkai Guo, Yao Chen et al.

Initialization is essential to monocular Simultaneous Localization and Mapping (SLAM) problems. This paper focuses on a novel initialization method for monocular SLAM based on planar features. The algorithm starts by homography estimation in a sliding window. It then proceeds to a global plane optimization (GPO) to obtain camera poses and the plane normal. 3D points can be recovered using planar constraints without triangulation. The proposed method fully exploits the plane information from multiple frames and avoids the ambiguities in homography decomposition. We validate our algorithm on the collected chessboard dataset against baseline implementations and present extensive analysis. Experimental results show that our method outperforms the fine-tuned baselines in both accuracy and real-time.

12.1AIAug 6, 2018

An Efficient Deep Reinforcement Learning Model for Urban Traffic Control

Yilun Lin, Xingyuan Dai, Li Li et al.

Urban Traffic Control (UTC) plays an essential role in Intelligent Transportation System (ITS) but remains difficult. Since model-based UTC methods may not accurately describe the complex nature of traffic dynamics in all situations, model-free data-driven UTC methods, especially reinforcement learning (RL) based UTC methods, received increasing interests in the last decade. However, existing DL approaches did not propose an efficient algorithm to solve the complicated multiple intersections control problems whose state-action spaces are vast. To solve this problem, we propose a Deep Reinforcement Learning (DRL) algorithm that combines several tricks to master an appropriate control strategy within an acceptable time. This new algorithm relaxes the fixed traffic demand pattern assumption and reduces human invention in parameter tuning. Simulation experiments have shown that our method outperforms traditional rule-based approaches and has the potential to handle more complex traffic problems in the real world.

8.8LGJul 11, 2017

DeepTrend: A Deep Hierarchical Neural Network for Traffic Flow Prediction

Xingyuan Dai, Rui Fu, Yilun Lin et al.

In this paper, we consider the temporal pattern in traffic flow time series, and implement a deep learning model for traffic flow prediction. Detrending based methods decompose original flow series into trend and residual series, in which trend describes the fixed temporal pattern in traffic flow and residual series is used for prediction. Inspired by the detrending method, we propose DeepTrend, a deep hierarchical neural network used for traffic flow prediction which considers and extracts the time-variant trend. DeepTrend has two stacked layers: extraction layer and prediction layer. Extraction layer, a fully connected layer, is used to extract the time-variant trend in traffic flow by feeding the original flow series concatenated with corresponding simple average trend series. Prediction layer, an LSTM layer, is used to make flow prediction by feeding the obtained trend from the output of extraction layer and calculated residual series. To make the model more effective, DeepTrend needs first pre-trained layer-by-layer and then fine-tuned in the entire network. Experiments show that DeepTrend can noticeably boost the prediction performance compared with some traditional prediction models and LSTM with detrending based methods.