Yichao Liu

CV
h-index19
9papers
1,062citations
Novelty52%
AI Score57

9 Papers

AINov 29, 2023
Exploring Large Language Models for Human Mobility Prediction under Public Events

Yuebing Liang, Yichao Liu, Xiaohan Wang et al.

Public events, such as concerts and sports games, can be major attractors for large crowds, leading to irregular surges in travel demand. Accurate human mobility prediction for public events is thus crucial for event planning as well as traffic or crowd management. While rich textual descriptions about public events are commonly available from online sources, it is challenging to encode such information in statistical or machine learning models. Existing methods are generally limited in incorporating textual information, handling data sparsity, or providing rationales for their predictions. To address these challenges, we introduce a framework for human mobility prediction under public events (LLM-MPE) based on Large Language Models (LLMs), leveraging their unprecedented ability to process textual data, learn from minimal examples, and generate human-readable explanations. Specifically, LLM-MPE first transforms raw, unstructured event descriptions from online sources into a standardized format, and then segments historical mobility data into regular and event-related components. A prompting strategy is designed to direct LLMs in making and rationalizing demand predictions considering historical mobility and event features. A case study is conducted for Barclays Center in New York City, based on publicly available event information and taxi trip data. Results show that LLM-MPE surpasses traditional models, particularly on event days, with textual data significantly enhancing its accuracy. Furthermore, LLM-MPE offers interpretable insights into its predictions. Despite the great potential of LLMs, we also identify key challenges including misinformation and high costs that remain barriers to their broader adoption in large-scale human mobility analysis.

8.8CVApr 18
Rethinking Cross-Dose PET Denoising: Mitigating Averaging Effects via Residual Noise Learning

Yichao Liu, Zongru Shao, Yueyang Teng et al.

Cross-dose denoising for low-dose positron emission tomography (LDPET) has been proposed to address the limited generalization of models trained at a single noise level. In practice, neural networks trained on a specific dose level often fail to generalize to other dose conditions due to variations in noise magnitude and statistical properties. Conventional "one-size-for-all" models attempt to handle this variability but tend to learn averaged representations across noise levels, resulting in degraded performance. In this work, we analyze this limitation and show that standard training formulations implicitly optimize an expectation over heterogeneous noise distributions. To this end, we propose a unified residual noise learning framework that estimates noise directly from low-dose PET images rather than predicting full-dose images. Experiments on large-scale multi-dose PET datasets from two medical centers demonstrate that the proposed method outperforms the "one-size-for-all" model, individual dose-specific U-Net models, and dose-conditioned approaches, achieving improved denoising performance. These results indicate that residual noise learning effectively mitigates the averaging effect and enhances generalization for cross-dose PET denoising.

64.4AIMay 15
DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

Yichao Liu, Huawen Shen, Liu Yu et al.

GUI agents powered by Multimodal Large Language Models (MLLMs) have demonstrated impressive capability in understanding and executing user instructions. However, accurately grounding instruction-relevant elements from high-resolution screenshots cluttered with irrelevant UI components remains challenging for existing approaches. Inspired by how humans dynamically adjust their perceptual scope to locate task-related regions on complex screens, we propose DRS-GUI, a training-free dynamic region search framework for GUI grounding that can be seamlessly integrated into existing MLLMs. DRS-GUI introduces a lightweight UI Perceptor that performs three human-like perceptual actions (Focus, Shift, and Scatter) to progressively explore the interface and generate region proposals. To dynamically schedule these actions, we further design an Action Planner based on Monte Carlo Tree Search (MCTS). A region quality reward is employed to evaluate and select the highly instruction-relevant region, efficiently pruning redundant UI elements. Experiments demonstrate that DRS-GUI yields a 14\% improvement on ScreenSpot-Pro for general and GUI-specific MLLMs (Qwen2.5-VL-7B and UGround-V1-7B), significantly enhancing grounding performance and generalization.

81.5CVMay 14
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation

Zeyu Chen, Fangmin Zhao, Yan Shu et al.

Style-conditioned scene text generation faces unique challenges in extracting precise text styles from complex backgrounds and maintaining fine-grained style consistency across characters, especially for multilingual scripts. We propose StyleTextGen, a novel framework that learns to perceive and replicate visual text styles across different languages and writing systems. Our approach features three key contributions: First, we introduce a dual-branch style encoder dedicated to style modeling, yielding robust multilingual text style representations in complex real-world scenes. Second, we design a text style consistency loss that enhances style coherence and improves overall visual quality. Third, we develop a mask-guided inference strategy that ensures precise style alignment between generated and reference text. To facilitate systematic evaluation, we construct StyleText-CE, a bilingual scene text style benchmark covering both monolingual and cross-lingual settings. Extensive experiments demonstrate that StyleTextGen significantly outperforms existing methods in style consistency and cross-lingual generalization, establishing new state-of-the-art performance in multilingual style-conditioned text generation.

CVJan 20
Progressive self-supervised blind-spot denoising method for LDCT denoising

Yichao Liu, Yueyang Teng, Junwen Guo

Self-supervised learning is increasingly investigated for low-dose computed tomography (LDCT) image denoising, as it alleviates the dependence on paired normal-dose CT (NDCT) data, which are often difficult to acquire in clinical practice. In this paper, we propose a novel self-supervised training strategy that relies exclusively on LDCT images. We introduce a step-wise blind-spot denoising mechanism that enforces conditional independence in a progressive manner, enabling more fine-grained denoising learning. In addition, we add Gaussian noise to LDCT images, which acts as a regularization and mitigates overfitting. Extensive experiments on the Mayo LDCT dataset demonstrate that the proposed method consistently outperforms existing self-supervised approaches and achieves performance comparable to, or better than, several representative supervised denoising methods.

CVSep 30, 2024
DIG-FACE: De-biased Learning for Generalized Facial Expression Category Discovery

Tingzhang Luo, Yichao Liu, Yuanyuan Liu et al.

We introduce a novel task, Generalized Facial Expression Category Discovery (G-FACE), that discovers new, unseen facial expressions while recognizing known categories effectively. Even though there are generalized category discovery methods for natural images, they show compromised performance on G-FACE. We identified two biases that affect the learning: implicit bias, coming from an underlying distributional gap between new categories in unlabeled data and known categories in labeled data, and explicit bias, coming from shifted preference on explicit visual facial change characteristics from known expressions to unknown expressions. By addressing the challenges caused by both biases, we propose a Debiased G-FACE method, namely DIG-FACE, that facilitates the debiasing of both implicit and explicit biases. In the implicit debiasing process of DIG-FACE, we devise a novel learning strategy that aims at estimating and minimizing the upper bound of implicit bias. In the explicit debiasing process, we optimize the model's ability to handle nuanced visual facial expression data by introducing a hierarchical category-discrimination refinement strategy: sample-level, triplet-level, and distribution-level optimizations. Extensive experiments demonstrate that our DIG-FACE significantly enhances recognition accuracy for both known and new categories, setting a first-of-its-kind standard for the task.

CVNov 24, 2021Code
NAM: Normalization-based Attention Module

Yichao Liu, Zongru Shao, Yueyang Teng et al.

Recognizing less salient features is the key for model compression. However, it has not been investigated in the revolutionary attention mechanisms. In this work, we propose a novel normalization-based attention module (NAM), which suppresses less salient weights. It applies a weight sparsity penalty to the attention modules, thus, making them more computational efficient while retaining similar performance. A comparison with three other attention mechanisms on both Resnet and Mobilenet indicates that our method results in higher accuracy. Code for this paper can be publicly accessed at https://github.com/Christian-lyc/NAM.

CVSep 8, 2025
Hybrid Swin Attention Networks for Simultaneously Low-Dose PET and CT Denoising

Yichao Liu, Hengzhi Xue, YueYang Teng

Low-dose computed tomography (LDCT) and positron emission tomography (PET) have emerged as safer alternatives to conventional imaging modalities by significantly reducing radiation exposure. However, this reduction often results in increased noise and artifacts, which can compromise diagnostic accuracy. Consequently, denoising for LDCT/PET has become a vital area of research aimed at enhancing image quality while maintaining radiation safety. In this study, we introduce a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module. The EGA modules enhance both spatial and channel-wise interaction, improving the network's capacity to capture relevant features, while the hybrid upsampling module mitigates the risk of overfitting to noise. We validate the proposed approach using a publicly available LDCT/PET dataset. Experimental results demonstrate that HSANet achieves superior denoising performance compared to existing methods, while maintaining a lightweight model size suitable for deployment on GPUs with standard memory configurations. This makes our approach highly practical for real-world clinical applications.

CVDec 10, 2021
Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Yichao Liu, Zongru Shao, Nico Hoffmann

A variety of attention mechanisms have been studied to improve the performance of various computer vision tasks. However, the prior methods overlooked the significance of retaining the information on both channel and spatial aspects to enhance the cross-dimension interactions. Therefore, we propose a global attention mechanism that boosts the performance of deep neural networks by reducing information reduction and magnifying the global interactive representations. We introduce 3D-permutation with multilayer-perceptron for channel attention alongside a convolutional spatial attention submodule. The evaluation of the proposed mechanism for the image classification task on CIFAR-100 and ImageNet-1K indicates that our method stably outperforms several recent attention mechanisms with both ResNet and lightweight MobileNet.