Liang Yuan

CV
h-index27
10papers
14citations
Novelty56%
AI Score51

10 Papers

50.8CVApr 8
Robust Mesh Saliency Ground Truth Acquisition in VR via View Cone Sampling and Manifold Diffusion

Guoquan Zheng, Jie Hao, Huiyu Duan et al.

As the complexity of 3D digital content grows exponentially, understanding human visual attention is critical for optimizing rendering and processing resources. Therefore, reliable 3D mesh saliency ground truth (GT) is essential for human-centric visual modeling in virtual reality (VR). However, existing VR eye-tracking frameworks are fundamentally bottlenecked by their underlying acquisition and generation mechanisms. The reliance on zero-area single ray sampling (SRS) fails to capture contextual features, leading to severe texture aliasing and discontinuous saliency signals. And the conventional application of Euclidean smoothing propagates saliency across disconnected physical gaps, resulting in semantic confusion on complex 3D manifolds. This paper proposes a robust framework to address these limitations. We first introduce a view cone sampling (VCS) strategy, which simulates the human foveal receptive field via Gaussian-distributed ray bundles to improve sampling robustness for complex topologies. Furthermore, a hybrid Manifold-Euclidean constrained diffusion (HCD) algorithm is developed, fusing manifold geodesic constraints with Euclidean scales to ensure topologically-consistent saliency propagation. We demonstrate the improvement in performance over baseline methods and the benefits for downstream tasks through subjective experiments and qualitative and quantitative methods. By mitigating "topological short-circuits" and aliasing, our framework provides a high-fidelity 3D attention acquisition paradigm that aligns with natural human perception, offering a more accurate and robust baseline for 3D mesh saliency research.

CVDec 25, 2024Code
Embodied Image Quality Assessment for Robotic Intelligence

Jianbo Zhang, Chunyi Li, Jie Hao et al.

Image Quality Assessment (IQA) of User-Generated Content (UGC) is a critical technique for human Quality of Experience (QoE). However, does the the image quality of Robot-Generated Content (RGC) demonstrate traits consistent with the Moravec paradox, potentially conflicting with human perceptual norms? Human subjective scoring is more based on the attractiveness of the image. Embodied agent are required to interact and perceive in the environment, and finally perform specific tasks. Visual images as inputs directly influence downstream tasks. In this paper, we explore the perception mechanism of embodied robots for image quality. We propose the first Embodied Preference Database (EPD), which contains 12,500 distorted image annotations. We establish assessment metrics based on the downstream tasks of robot. In addition, there is a gap between UGC and RGC. To address this, we propose a novel Multi-scale Attention Embodied Image Quality Assessment called MA-EIQA. For the proposed EPD dataset, this is the first no-reference IQA model designed for embodied robot. Finally, the performance of mainstream IQA algorithms on EPD dataset is verified. The experiments demonstrate that quality assessment of embodied images is different from that of humans. We sincerely hope that the EPD can contribute to the development of embodied AI by focusing on image quality assessment. The benchmark is available at https://github.com/Jianbo-maker/EPD_benchmark.

CVNov 24, 2025Code
Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling

Long Tang, Guoquan Zhen, Jie Hao et al.

Blind image quality assessment (BIQA) plays a crucial role in evaluating and optimizing visual experience. Most existing BIQA approaches fuse shallow and deep features extracted from backbone networks, while overlooking the unequal contributions to quality prediction. Moreover, while various vision encoder backbones are widely adopted in BIQA, the effective quality decoding architectures remain underexplored. To address these limitations, this paper investigates the contributions of shallow and deep features to BIQA, and proposes a effective quality feature decoding framework via GCN-enhanced \underline{l}ayer\underline{i}nteraction and MoE-based \underline{f}eature d\underline{e}coupling, termed \textbf{(Life-IQA)}. Specifically, the GCN-enhanced layer interaction module utilizes the GCN-enhanced deepest-layer features as query and the penultimate-layer features as key, value, then performs cross-attention to achieve feature interaction. Moreover, a MoE-based feature decoupling module is proposed to decouple fused representations though different experts specialized for specific distortion types or quality dimensions. Extensive experiments demonstrate that Life-IQA shows more favorable balance between accuracy and cost than a vanilla Transformer decoder and achieves state-of-the-art performance on multiple BIQA benchmarks.The code is available at: \href{https://github.com/TANGLONG2/Life-IQA/tree/main}{\texttt{Life-IQA}}.

72.4DCApr 27
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales

Haozhi Han, Ruge Zhang, Haoquan Chen et al.

Lifetime prediction of reactor pressure vessel (RPV) steel requires bridging atomistic degradation mechanisms with service-scale spatial and temporal regimes, from Angstroms and picoseconds to meters and decades. Existing engineering-scale models provide long-range reach but rely on fitted degradation laws, while recent atomistic kinetic Monte Carlo (AKMC) advances still fail to achieve year-and-meter-scale coverage. We present AtomWorld, an atomistic world-modeling framework for RPV steel lifetime simulation co-designed with leadership-scale supercomputing through three tightly coupled layers: (1) algorithm: AtomWorld recasts classical AKMC as an atomistic world model that learns consequence-aware state transitions over the ab initio energy landscape; (2) HPC: it co-designs this formulation with modern supercomputers, yielding a compute-dense, synchronization-light, and communication-efficient execution pipeline; and (3) application: it extends atomistic world modeling to engineering-scale simulation through a physically grounded voxel-parallel framework, offering a scalable pathway from local atomistic dynamics to engineering-scale degradation evolution. We demonstrate a paradigm shift in atomistic simulation: AtomWorld enables atomistic simulation of RPV steel across year-and-meter scales for the first time, extending direct atomistic modeling to ten-quintillion-atom systems and achieving a time-to-solution of 1.71 days for one simulated service year. These capabilities are sustained across five leadership supercomputers with 92-97% scaling efficiency and peak performance up to 1.27 EFLOP/s, corresponding to 48% of the Lineshine peak FP64 performance.

CVJan 2, 2024
ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

Dingkun Yan, Liang Yuan, Erwin Wu et al.

Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the "distribution problem", which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.

IRAug 9, 2025
CLAP: Coreference-Linked Augmentation for Passage Retrieval

Huanwei Xu, Lin Xu, Liang Yuan

Large Language Model (LLM)-based passage expansion has shown promise for enhancing first-stage retrieval, but often underperforms with dense retrievers due to semantic drift and misalignment with their pretrained semantic space. Beyond this, only a portion of a passage is typically relevant to a query, while the rest introduces noise--an issue compounded by chunking techniques that break coreference continuity. We propose Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent chunks, resolves coreference chains, and generates localized pseudo-queries aligned with dense retriever representations. A simple fusion of global topical signals and fine-grained subtopic signals achieves robust performance across domains. CLAP yields consistent gains even as retriever strength increases, enabling dense retrievers to match or surpass second-stage rankers such as BM25 + MonoT5-3B, with up to 20.68% absolute nDCG@10 improvement. These improvements are especially notable in out-of-domain settings, where conventional LLM-based expansion methods relying on domain knowledge often falter. CLAP instead adopts a logic-centric pipeline that enables robust, domain-agnostic generalization.

LGApr 22, 2021
An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

Kun Li, Liang Yuan, Yunquan Zhang et al.

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing parallel methods for clustering or regression often suffer from problems of low accuracy, slow convergence, and complex hyperparameter-tuning. Furthermore, the parallel efficiency is usually difficult to improve while striking a balance between preserving model properties and partitioning computing workloads on distributed systems. In this paper, we propose a novel and simple data structure capturing the most important information among data samples. It has several advantageous properties supporting a hierarchical clustering strategy that is irrelevant to the hardware parallelism, well-defined metrics for determining optimal clustering, balanced partition for maintaining the compactness property, and efficient parallelization for accelerating computation phases. Then we combine the clustering with regression techniques as a parallel library and utilize a hybrid structure of data and model parallelism to make predictions. Experiments illustrate that our library obtains remarkable performance on convergence, accuracy, and scalability.

IRFeb 21, 2021
A Concept Knowledge-Driven Keywords Retrieval Framework for Sponsored Search

Yijiang Lian, Yubo Liu, Zhicong Ye et al.

In sponsored search, retrieving synonymous keywords for exact match type is important for accurately targeted advertising. Data-driven deep learning-based method has been proposed to tackle this problem. An apparent disadvantage of this method is its poor generalization performance on entity-level long-tail instances, even though they might share similar concept-level patterns with frequent instances. With the help of a large knowledge base, we find that most commercial synonymous query-keyword pairs can be abstracted into meaningful conceptual patterns through concept tagging. Based on this fact, we propose a novel knowledge-driven conceptual retrieval framework to mitigate this problem, which consists of three parts: data conceptualization, matching via conceptual patterns and concept-augmented discrimination. Both offline and online experiments show that our method is very effective. This framework has been successfully applied to Baidu's sponsored search system, which yields a significant improvement in revenue.

LGAug 5, 2020
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Yijiang Lian, Zhijie Chen, Xin Pei et al.

Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

APMar 3, 2018
Enhancement of land-use change modeling using convolutional neural networks and convolutional denoising autoencoders

Guodong Du, Liang Yuan, Kong Joo Shin et al.

The neighborhood effect is a key driving factor for the land-use change (LUC) process. This study applies convolutional neural networks (CNN) to capture neighborhood characteristics from satellite images and to enhance the performance of LUC modeling. We develop a hybrid CNN model (conv-net) to predict the LU transition probability by combining satellite images and geographical features. A spatial weight layer is designed to incorporate the distance-decay characteristics of neighborhood effect into conv-net. As an alternative model, we also develop a hybrid convolutional denoising autoencoder and multi-layer perceptron model (CDAE-net), which specifically learns latent representations from satellite images and denoises the image data. Finally, a DINAMICA-based cellular automata (CA) model simulates the LU pattern. The results show that the convolutional-based models improve the modeling performances compared with a model that accepts only the geographical features. Overall, conv-net outperforms CDAE-net in terms of LUC predictive performance. Nonetheless, CDAE-net performs better when the data are noisy.