Yan Guo

CV
h-index6
8papers
43citations
Novelty56%
AI Score52

8 Papers

MAMay 20Code
Argo: Efficient Importance Labeling for Enterprise Email Systems

Siddhant Ray, Ganesh Ananthanarayanan, Kevin Chian et al.

Email importance labeling has long been a critical yet challenging problem for businesses and individuals. Traditional approaches; such as keyword matching, user-defined rules, and sender-based heuristics; demand extensive manual feature engineering and fail to scale effectively or generalize. Recent advances in large language models (LLMs) demonstrate strong potential and a natural fit for this task, offering deep contextual understanding and superior labeling quality. However, using LLM models like GPT-4.1 at enterprise email volumes incurs prohibitive computational costs and hinders real-world deployment. We explore the trade-off space of using alternative labeling schemes as opposed to GPT4.1 scale LLMs, with the goal of achieving near GPT level labeling quality with significantly lower cost. We develop Argo, an enterprise email labeling framework, where we construct a profiler to efficiently search the cost quality trade-off space of labeling and identify cost-efficient alternatives to labeling emails. Additionally, we design an on-demand provisioning scheme to intelligently scale Argo with real time load, to minimize cost increases during peak load inference. Over 3 open-source email datasets, Argo achieves 148-167X inference cost reduction with negligible quality degradation and 20-640000X lower profiling costs, making large-scale, context-aware email labeling practical for enterprises.

CVAug 20, 2024
EdgeNAT: Transformer for Efficient Edge Detection

Jinghuai Jie, Yan Guo, Guixing Wu et al.

Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' application in edge detection has not been fully exploited. In this paper, we propose EdgeNAT, a one-stage transformer-based edge detector with DiNAT as the encoder, capable of extracting object boundaries and meaningful edges both accurately and efficiently. On the one hand, EdgeNAT captures global contextual information and detailed local cues with DiNAT, on the other hand, it enhances feature representation with a novel SCAF-MLA decoder by utilizing both inter-spatial and inter-channel relationships of feature maps. Extensive experiments on multiple datasets show that our method achieves state-of-the-art performance on both RGB and depth images. Notably, on the widely used BSDS500 dataset, our L model achieves impressive performances, with ODS F-measure and OIS F-measure of 86.0%, 87.6% for multi-scale input,and 84.9%, and 86.3% for single-scale input, surpassing the current state-of-the-art EDTER by 1.2%, 1.1%, 1.7%, and 1.6%, respectively. Moreover, as for throughput, our approach runs at 20.87 FPS on RTX 4090 GPU with single-scale input. The code for our method will be released soon.

CVFeb 6
Accelerating Vision Transformers on Brain Processing Unit

Jinchi Tang, Yan Guo

With the advancement of deep learning technologies, specialized neural processing hardware such as Brain Processing Units (BPUs) have emerged as dedicated platforms for CNN acceleration, offering optimized INT8 computation capabilities for convolutional operations. Meanwhile, Vision Transformer (ViT) models, such as the Data-efficient Image Transformer (DeiT), have demonstrated superior performance and play increasingly crucial roles in computer vision tasks. However, due to the architectural mismatch between CNN-optimized hardware and Vision Transformer computation characteristics--namely, that linear layers in Transformers operate on three-dimensional data while BPU acceleration is designed for four-dimensional convolution operations-it is difficult or even impossible to leverage BPU's advantages when deploying Vision Transformers. To address this challenge, we propose a novel approach that restructures the Vision Transformer by replacing linear layers and layer normalization operations with carefully designed convolutional operators. This enables DeiT to fully utilize the acceleration capabilities of BPUs, while allowing the original weight parameters to be inherited by the restructured models without retraining or fine-tuning. To the best of our knowledge, this is the first successful deployment of Vision Transformers that fully leverages BPU classification datasets demonstrate the effectiveness of our approach. Specifically, the quantized DeiT-Base model achieves 80.4% accuracy on ImageNet, compared to the original 81.8%, while obtaining up to a 3.8* inference speedup. Our finetuned DeiT model on the flower classification dataset also achieves excellent performance, with only a 0.5% accuracy drop for the DeiT-Base model, further demonstrating the effectiveness of our method.

CVNov 1, 2023
Neural Implicit Field Editing Considering Object-environment Interaction

Zhihong Zeng, Zongji Wang, Yuanben Zhang et al.

The 3D scene editing method based on neural implicit field has gained wide attention. It has achieved excellent results in 3D editing tasks. However, existing methods often blend the interaction between objects and scene environment. The change of scene appearance like shadows is failed to be displayed in the rendering view. In this paper, we propose an Object and Scene environment Interaction aware (OSI-aware) system, which is a novel two-stream neural rendering system considering object and scene environment interaction. To obtain illuminating conditions from the mixture soup, the system successfully separates the interaction between objects and scene environment by intrinsic decomposition method. To study the corresponding changes to the scene appearance from object-level editing tasks, we introduce a depth map guided scene inpainting method and shadow rendering method by point matching strategy. Extensive experiments demonstrate that our novel pipeline produce reasonable appearance changes in scene editing tasks. It also achieve competitive performance for the rendering quality in novel-view synthesis tasks.

BMMay 7, 2024
DrugLLM: Open Large Language Model for Few-shot Molecule Generation

Xianggen Liu, Yan Guo, Haoran Li et al.

Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequently, the few-shot learning capacity of small-molecule drug modification remains impeded. In this work, we introduced DrugLLM, a LLM tailored for drug design. During the training process, we employed Group-based Molecular Representation (GMR) to represent molecules, arranging them in sequences that reflect modifications aimed at enhancing specific molecular properties. DrugLLM learns how to modify molecules in drug discovery by predicting the next molecule based on past modifications. Extensive computational experiments demonstrate that DrugLLM can generate new molecules with expected properties based on limited examples, presenting a powerful few-shot molecule generation capacity.

SPApr 18
TimeRFT: Stimulating Generalizable Time Series Forecasting for TSFMs via Reinforcement Finetuning

Siyang Li, Yize Chen, Zijie Zhu et al.

Time Series Foundation Models (TSFMs) advance generalization and data efficiency in time series forecasting by unified large-scale pretraining. But TSFMs remain lacking when adapting to specific downstream forecasting tasks for two reasons. First, the non-stationary and uncertain nature of time series data lead to inevitable temporal distribution shifts between historical training and future testing data, while current Supervised FineTuning (SFT)-based methods are prone to overfitting and may degrade generalization. Second, training data availability varies across forecasting tasks, requiring TSFMs to generalize well under diverse data regimes. To address these challenges, we introduce the Time series Reinforcement Finetuning (TimeRFT) paradigm for TSFM downstream adaptation, which consists of two task-specific training recipes: i) A forecasting quality-based temporal reward mechanism that conducts a multi-faceted evaluation of the contribution of each prediction step to overall forecasting accuracy. ii) A forecasting difficulty-based data selection strategy to identify time series samples with generalizable predictive patterns and informative training signals. Extensive experiments demonstrate TimeRFT can consistently outperform SFT-based adaptation methods across various real-world forecasting tasks and training data regimes, enhancing prediction accuracy and generalization against unforeseen distribution shifts.

LGSep 29, 2025
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning

Siyang Li, Yize Chen, Yan Guo et al.

Advanced deep learning-based approaches have been actively applied to forecast the spatiotemporal physical dynamics governed by partial differential equations (PDEs), which acts as a critical procedure in tackling many science and engineering problems. As real-world physical environments like PDE system parameters are always capricious, how to generalize across unseen out-of-distribution (OOD) forecasting scenarios using limited training data is of great importance. To bridge this barrier, existing methods focus on discovering domain-generalizable representations across various PDE dynamics trajectories. However, their zero-shot OOD generalization capability remains deficient, since extra test-time samples for domain-specific adaptation are still required. This is because the fundamental physical invariance in PDE dynamical systems are yet to be investigated or integrated. To this end, we first explicitly define a two-fold PDE invariance principle, which points out that ingredient operators and their composition relationships remain invariant across different domains and PDE system evolution. Next, to capture this two-fold PDE invariance, we propose a physics-guided invariant learning method termed iMOOE, featuring an Invariance-aligned Mixture Of Operator Expert architecture and a frequency-enriched invariant learning objective. Extensive experiments across simulated benchmarks and real-world applications validate iMOOE's superior in-distribution performance and zero-shot generalization capabilities on diverse OOD forecasting scenarios.

LGMar 30, 2022
Identification of diffracted vortex beams at different propagation distances using deep learning

Heng Lv, Yan Guo, Zi-Xiang Yang et al.

Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this article, we exploit an enhanced deep learning neural network to identify different OAM modes of light at multiple propagation distances with phase distortions. Specifically, our trained deep learning neural network can efficiently identify the vortex beam's topological charge and propagation distance with 97% accuracy. Our technique has important implications for OAM based communication and sensing protocols.