Wen Ye

LG
h-index29
10papers
402citations
Novelty45%
AI Score53

10 Papers

LGOct 8, 2023
TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

Defu Cao, Furong Jia, Sercan O Arik et al.

The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the design of prompts to facilitate distribution adaptation in different types of time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on zero shot setting for a number of time series benchmark datasets. This performance gain is observed not only in scenarios involving previously unseen datasets but also in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.

LGSep 3, 2024
TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model

Defu Cao, Wen Ye, Yizhou Zhang et al.

Foundation models, particularly Large Language Models (LLMs), have revolutionized text and video processing, yet time series data presents distinct challenges for such approaches due to domain-specific features such as missing values, multi-resolution characteristics, etc. Furthermore, the de-facto autoregressive transformers tend to learn deterministic temporal dependencies within pre-trained data while overlooking inherent uncertainties and lacking integration of physical constraints. In this paper, we introduce TimeDiT, a diffusion transformer model that synergistically combines transformer-based temporal dependency learning with diffusion-based probabilistic sampling. TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks while introducing a theoretically grounded, finetuning-free model editing strategy that enables flexible integration of external knowledge during sampling. Acknowledging the challenges of unifying multiple downstream tasks under a single model, our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning; and in domain tasks, i.e., multi-resolution forecasting, anomaly detection, and data generation, establishing it as a \textit{proto-foundation model} that bridges the gap between general-purpose and domain-specific models.

SIOct 10, 2022
Region2Vec: Community Detection on Spatial Networks Using Graph Embedding with Node Attributes and Spatial Interactions

Yunlei Liang, Jiawei Zhu, Wen Ye et al.

Community Detection algorithms are used to detect densely connected components in complex networks and reveal underlying relationships among components. As a special type of networks, spatial networks are usually generated by the connections among geographic regions. Identifying the spatial network communities can help reveal the spatial interaction patterns, understand the hidden regional structures and support regional development decision-making. Given the recent development of Graph Convolutional Networks (GCN) and its powerful performance in identifying multi-scale spatial interactions, we proposed an unsupervised GCN-based community detection method "region2vec" on spatial networks. Our method first generates node embeddings for regions that share common attributes and have intense spatial interactions, and then applies clustering algorithms to detect communities based on their embedding similarity and spatial adjacency. Experimental results show that while existing methods trade off either attribute similarities or spatial interactions for one another, "region2vec" maintains a great balance between both and performs the best when one wants to maximize both attribute similarities and spatial interactions within communities.

LGMar 17
Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

Wen Ye, Wei Yang, Defu Cao et al.

Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain-specific computational tools and an error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system's capabilities along two axes: (1) fundamental time series understanding assessed by TimeSeriesExam and (2) complex, multi-step inference evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.

AIJan 29
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models

Bowen Fang, Wen Ye, Yunyue Su et al.

Prevalent retrieval-based tool-use pipelines struggle with a dual semantic challenge: their retrievers often employ encoders that fail to capture complex semantics, while the Large Language Model (LLM) itself lacks intrinsic tool knowledge from its natural language pretraining. Generative methods offer a powerful alternative by unifying selection and execution, tasking the LLM to directly learn and generate tool identifiers. However, the common practice of mapping each tool to a unique new token introduces substantial limitations: it creates a scalability and generalization crisis, as the vocabulary size explodes and each tool is assigned a semantically isolated token. This approach also creates a semantic bottleneck that hinders the learning of collaborative tool relationships, as the model must infer them from sparse co-occurrences of monolithic tool IDs within a vast library. To address these limitations, we propose ToolWeaver, a novel generative tool learning framework that encodes tools into hierarchical sequences. This approach makes vocabulary expansion logarithmic to the number of tools. Crucially, it enables the model to learn collaborative patterns from the dense co-occurrence of shared codes, rather than the sparse co-occurrence of monolithic tool IDs. We generate these structured codes through a novel tokenization process designed to weave together a tool's intrinsic semantics with its extrinsic co-usage patterns. These structured codes are then integrated into the LLM through a generative alignment stage, where the model is fine-tuned to produce the hierarchical code sequences. Evaluation results with nearly 47,000 tools show that ToolWeaver significantly outperforms state-of-the-art methods, establishing a more scalable, generalizable, and semantically-aware foundation for advanced tool-augmented agents.

CLJul 8, 2024
ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

Zixin Shu, Rui Hua, Dengying Yan et al.

Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct an Integrated Ontology of symptom phenotypes (ISPO) to support the data mining of Chinese EMRs and real-world study in TCM field. Methods: To construct an integrated ontology of symptom phenotypes (ISPO), we manually annotated classical TCM textbooks and large-scale Chinese electronic medical records (EMRs) to collect symptom terms with support from a medical text annotation system. Furthermore, to facilitate the semantic interoperability between different terminologies, we incorporated public available biomedical vocabularies by manual mapping between Chinese terms and English terms with cross-references to source vocabularies. In addition, we evaluated the ISPO using independent clinical EMRs to provide a high-usable medical ontology for clinical data analysis. Results: By integrating 78,696 inpatient cases of EMRs, 5 biomedical vocabularies, 21 TCM books and dictionaries, ISPO provides 3,147 concepts, 23,475 terms, and 55,552 definition or contextual texts. Adhering to the taxonomical structure of the related anatomical systems of symptom phenotypes, ISPO provides 12 top-level categories and 79 middle-level sub-categories. The validation of data analysis showed the ISPO has a coverage rate of 95.35%, 98.53% and 92.66% for symptom terms with occurrence rates of 0.5% in additional three independent curated clinical datasets, which can demonstrate the significant value of ISPO in mapping clinical terms to ontologies.

CLJan 16, 2025Code
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Zhaocheng Liu, Quan Tu, Wen Ye et al.

Recently, large language models have shown great potential to transform online medical consultation. Despite this, most research targets improving diagnostic accuracy with ample information, often overlooking the inquiry phase. Some studies try to evaluate or refine doctor models by using prompt-engineered patient agents. However, prompt engineering alone falls short in accurately simulating real patients. We need to explore new paradigms for patient simulation. Furthermore, the relationship between inquiry and diagnosis remains unexplored. This paper extracts dialogue strategies from real doctor-patient conversations to guide the training of a patient simulator. Our simulator shows higher anthropomorphism and lower hallucination rates, using dynamic dialogue strategies. This innovation offers a more accurate evaluation of diagnostic models and generates realistic synthetic data. We conduct extensive experiments on the relationship between inquiry and diagnosis, showing they adhere to Liebig's law: poor inquiry limits diagnosis effectiveness, regardless of diagnostic skill, and vice versa. The experiments also reveal substantial differences in inquiry performance among models. To delve into this phenomenon, the inquiry process is categorized into four distinct types. Analyzing the distribution of inquiries across these types helps explain the performance differences. The weights of our patient simulator are available https://github.com/PatientSimulator/PatientSimulator.

LGSep 1, 2025Code
When LLM Meets Time Series: Can LLMs Perform Multi-Step Time Series Reasoning and Inference

Wen Ye, Jinbo Liu, Defu Cao et al.

The rapid advancement of Large Language Models (LLMs) has sparked growing interest in their application to time series analysis tasks. However, their ability to perform complex reasoning over temporal data in real-world application domains remains underexplored. To move toward this goal, a first step is to establish a rigorous benchmark dataset for evaluation. In this work, we introduce the TSAIA Benchmark, a first attempt to evaluate LLMs as time-series AI assistants. To ensure both scientific rigor and practical relevance, we surveyed over 20 academic publications and identified 33 real-world task formulations. The benchmark encompasses a broad spectrum of challenges, ranging from constraint-aware forecasting to anomaly detection with threshold calibration: tasks that require compositional reasoning and multi-step time series analysis. The question generator is designed to be dynamic and extensible, supporting continuous expansion as new datasets or task types are introduced. Given the heterogeneous nature of the tasks, we adopt task-specific success criteria and tailored inference-quality metrics to ensure meaningful evaluation for each task. We apply this benchmark to assess eight state-of-the-art LLMs under a unified evaluation protocol. Our analysis reveals limitations in current models' ability to assemble complex time series analysis workflows, underscoring the need for specialized methodologies for domain-specific adaptation. Our benchmark is available at https://huggingface.co/datasets/Melady/TSAIA, and the code is available at https://github.com/USC-Melady/TSAIA.

CVOct 8, 2025Code
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation

Wen Ye, Zhaocheng Liu, Yuwei Gui et al.

Text-to-image synthesis has made remarkable progress, yet accurately interpreting complex and lengthy prompts remains challenging, often resulting in semantic inconsistencies and missing details. Existing solutions, such as fine-tuning, are model-specific and require training, while prior automatic prompt optimization (APO) approaches typically lack systematic error analysis and refinement strategies, resulting in limited reliability and effectiveness. Meanwhile, test-time scaling methods operate on fixed prompts and on noise or sample numbers, limiting their interpretability and adaptability. To solve these, we introduce a flexible and efficient test-time prompt optimization strategy that operates directly on the input text. We propose a plug-and-play multi-agent system called GenPilot, integrating error analysis, clustering-based adaptive exploration, fine-grained verification, and a memory module for iterative optimization. Our approach is model-agnostic, interpretable, and well-suited for handling long and complex prompts. Simultaneously, we summarize the common patterns of errors and the refinement strategy, offering more experience and encouraging further exploration. Experiments on DPG-bench and Geneval with improvements of up to 16.9% and 5.7% demonstrate the strong capability of our methods in enhancing the text and image consistency and structural coherence of generated images, revealing the effectiveness of our test-time prompt optimization strategy. The code is available at https://github.com/27yw/GenPilot.

SINov 23, 2024
GeoAI-Enhanced Community Detection on Spatial Networks with Graph Deep Learning

Yunlei Liang, Jiawei Zhu, Wen Ye et al.

Spatial networks are useful for modeling geographic phenomena where spatial interaction plays an important role. To analyze the spatial networks and their internal structures, graph-based methods such as community detection have been widely used. Community detection aims to extract strongly connected components from the network and reveal the hidden relationships between nodes, but they usually do not involve the attribute information. To consider edge-based interactions and node attributes together, this study proposed a family of GeoAI-enhanced unsupervised community detection methods called region2vec based on Graph Attention Networks (GAT) and Graph Convolutional Networks (GCN). The region2vec methods generate node neural embeddings based on attribute similarity, geographic adjacency and spatial interactions, and then extract network communities based on node embeddings using agglomerative clustering. The proposed GeoAI-based methods are compared with multiple baselines and perform the best when one wants to maximize node attribute similarity and spatial interaction intensity simultaneously within the spatial network communities. It is further applied in the shortage area delineation problem in public health and demonstrates its promise in regionalization problems.