LGAug 16, 2023Code
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series ForecastersChing Chang, Wei-Yao Wang, Wen-Chih Peng et al.
Multivariate time-series forecasting is vital in various domains, e.g., economic planning and weather prediction. Deep train-from-scratch models have exhibited effective performance yet require large amounts of data, which limits real-world applicability. Recently, researchers have leveraged the representation learning transferability of pre-trained Large Language Models (LLMs) to handle limited non-linguistic datasets effectively. However, incorporating LLMs with time-series data presents challenges of limited adaptation due to different compositions between time-series and linguistic data, and the inability to process multi-scale temporal information. To tackle these challenges, we propose LLM4TS, a framework for time-series forecasting with pre-trained LLMs. LLM4TS consists of a two-stage fine-tuning strategy: the time-series alignment stage to align LLMs with the nuances of time-series data, and the forecasting fine-tuning stage for downstream time-series forecasting tasks. Furthermore, our framework features a novel two-level aggregation method that integrates multi-scale temporal data within pre-trained LLMs, enhancing their ability to interpret time-specific information. In experiments across 7 time-series forecasting datasets, LLM4TS is superior to existing state-of-the-art methods compared with trained-from-scratch models in full-shot scenarios, and also achieves the highest rank in few-shot scenarios. In addition, evaluations compared with different unsupervised representation learning approaches highlight LLM4TS's effectiveness with representation learning in forecasting tasks. Ablation studies further validate each component's contribution to LLM4TS and underscore the essential role of utilizing LLM's pre-trained weights for optimal performance. The code is available at https://github.com/blacksnail789521/LLM4TS.
CLFeb 12, 2023Code
Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact VerificationWei-Wei Du, Hong-Wei Wu, Wei-Yao Wang et al.
Multi-modal fact verification has become an important but challenging issue on social media due to the mismatch between the text and images in the misinformation of news content, which has been addressed by considering cross-modalities to identify the veracity of the news in recent years. In this paper, we propose the Pre-CoFactv2 framework with new parameter-efficient foundation models for modeling fine-grained text and input embeddings with lightening parameters, multi-modal multi-type fusion for not only capturing relations for the same and different modalities but also for different types (i.e., claim and document), and feature representations for explicitly providing metadata for each sample. In addition, we introduce a unified ensemble method to boost model performance by adjusting the importance of each trained model with not only the weights but also the powers. Extensive experiments show that Pre-CoFactv2 outperforms Pre-CoFact by a large margin and achieved new state-of-the-art results at the Factify challenge at AAAI 2023. We further illustrate model variations to verify the relative contributions of different components. Our team won the first prize (F1-score: 81.82%) and we made our code publicly available at https://github.com/wwweiwei/Pre-CoFactv2-AAAI-2023.
AIJun 27, 2023Code
Benchmarking Stroke Forecasting with Stroke-Level Badminton DatasetWei-Yao Wang, Wei-Wei Du, Wen-Chih Peng et al.
In recent years, badminton analytics has drawn attention due to the advancement of artificial intelligence and the efficiency of data collection. While there is a line of effective applications to improve and investigate player performance, there are only a few public badminton datasets that can be used by researchers outside the badminton domain. Existing badminton singles datasets focus on specific matchups; however, they cannot provide comprehensive studies on different players and various matchups. In this paper, we provide a badminton singles dataset, ShuttleSet22, which is collected from high-ranking matches in 2022. ShuttleSet22 consists of 30,172 strokes in 2,888 rallies in the training set, 1,400 strokes in 450 rallies in the validation set, and 2,040 strokes in 654 rallies in the testing set, with detailed stroke-level metadata within a rally. To benchmark existing work with ShuttleSet22, we hold a challenge, Track 2: Forecasting Future Turn-Based Strokes in Badminton Rallies, at CoachAI Badminton Challenge @ IJCAI 2023, to encourage researchers to tackle this real-world problem through innovative approaches and to summarize insights between the state-of-the-art baseline and improved techniques, exchanging inspiring ideas. The baseline codes and the dataset are made available at https://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023.
LGJul 17, 2024Code
COKE: Causal Discovery with Chronological Order and Expert Knowledge in High Proportion of Missing Manufacturing DataTing-Yun Ou, Ching Chang, Wen-Chih Peng
Understanding causal relationships between machines is crucial for fault diagnosis and optimization in manufacturing processes. Real-world datasets frequently exhibit up to 90% missing data and high dimensionality from hundreds of sensors. These datasets also include domain-specific expert knowledge and chronological order information, reflecting the recording order across different machines, which is pivotal for discerning causal relationships within the manufacturing data. However, previous methods for handling missing data in scenarios akin to real-world conditions have not been able to effectively utilize expert knowledge. Conversely, prior methods that can incorporate expert knowledge struggle with datasets that exhibit missing values. Therefore, we propose COKE to construct causal graphs in manufacturing datasets by leveraging expert knowledge and chronological order among sensors without imputing missing data. Utilizing the characteristics of the recipe, we maximize the use of samples with missing values, derive embeddings from intersections with an initial graph that incorporates expert knowledge and chronological order, and create a sensor ordering graph. The graph-generating process has been optimized by an actor-critic architecture to obtain a final graph that has a maximum reward. Experimental evaluations in diverse settings of sensor quantities and missing proportions demonstrate that our approach compared with the benchmark methods shows an average improvement of 39.9% in the F1-score. Moreover, the F1-score improvement can reach 62.6% when considering the configuration similar to real-world datasets, and 85.0% in real-world semiconductor datasets. The source code is available at https://github.com/OuTingYun/COKE.
LGJun 8, 2023
ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical AnalysisWei-Yao Wang, Yung-Chang Huang, Tsi-Ui Ik et al.
With the recent progress in sports analytics, deep learning approaches have demonstrated the effectiveness of mining insights into players' tactics for improving performance quality and fan engagement. This is attributed to the availability of public ground-truth datasets. While there are a few available datasets for turn-based sports for action detection, these datasets severely lack structured source data and stroke-level records since these require high-cost labeling efforts from domain experts and are hard to detect using automatic techniques. Consequently, the development of artificial intelligence approaches is significantly hindered when existing models are applied to more challenging structured turn-based sequences. In this paper, we present ShuttleSet, the largest publicly-available badminton singles dataset with annotated stroke-level records. It contains 104 sets, 3,685 rallies, and 36,492 strokes in 44 matches between 2018 and 2021 with 27 top-ranking men's singles and women's singles players. ShuttleSet is manually annotated with a computer-aided labeling tool to increase the labeling efficiency and effectiveness of selecting the shot type with a choice of 18 distinct classes, the corresponding hitting locations, and the locations of both players at each stroke. In the experiments, we provide multiple benchmarks (i.e., stroke influence, stroke forecasting, and movement forecasting) with baselines to illustrate the practicability of using ShuttleSet for turn-based analytics, which is expected to stimulate both academic and sports communities. Over the past two years, a visualization platform has been deployed to illustrate the variability of analysis cases from ShuttleSet for coaches to delve into players' tactical preferences with human-interactive interfaces, which was also used by national badminton teams during multiple international high-ranking matches.
LGNov 22, 2022
Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in BadmintonKai-Shiang Chang, Wei-Yao Wang, Wen-Chih Peng
Sports analytics has captured increasing attention since analysis of the various data enables insights for training strategies, player evaluation, etc. In this paper, we focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes. As this problem has not been addressed to date, movement forecasting can be tackled through sequence-based and graph-based models by formulating as a sequence prediction task. However, existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives on the next movement. Moreover, there is no existing work on representing strategic relations among players' shot types and movements. To address these challenges, we first introduce the procedure of the Player Movements (PM) graph to exploit the structural movements of players with strategic relations. Based on the PM graph, we propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors to capture the mutual interactions of players themselves and between both players within a rally, and dynamic players' tactics across time. In addition, hierarchical fusion modules are designed to incorporate the style influence of both players and rally interactions. Extensive experiments show that our model empirically outperforms both sequence- and graph-based methods and demonstrate the practical usage of movement forecasting.
LGDec 23, 2022
Look Around! A Neighbor Relation Graph Learning Framework for Real Estate AppraisalChih-Chia Li, Wei-Yao Wang, Wei-Wei Du et al.
Real estate appraisal is a crucial issue for urban applications, which aims to value the properties on the market. Traditional methods perform appraisal based on the domain knowledge, but suffer from the efforts of hand-crafted design. Recently, several methods have been developed to automatize the valuation process by taking the property trading transaction into account when estimating the property value. However, existing methods only consider the real estate itself, ignoring the relation between the properties. Moreover, naively aggregating the information of neighbors fails to model the relationships between the transactions. To tackle these limitations, we propose a novel Neighbor Relation Graph Learning Framework (ReGram) by incorporating the relation between target transaction and surrounding neighbors with the attention mechanism. To model the influence between communities, we integrate the environmental information and the past price of each transaction from other communities. Moreover, since the target transactions in different regions share some similarities and differences of characteristics, we introduce a dynamic adapter to model the different distributions of the target transactions based on the input-related kernel weights. Extensive experiments on the real-world dataset with various scenarios demonstrate that ReGram robustly outperforms the state-of-the-art methods. Furthermore, comprehensive ablation studies were conducted to examine the effectiveness of each component in ReGram.
LGNov 22, 2022
A Reinforcement Learning Badminton Environment for Simulating Player Tactics (Student Abstract)Li-Chun Huang, Nai-Zen Hseuh, Yen-Che Chien et al.
Recent techniques for analyzing sports precisely has stimulated various approaches to improve player performance and fan engagement. However, existing approaches are only able to evaluate offline performance since testing in real-time matches requires exhaustive costs and cannot be replicated. To test in a safe and reproducible simulator, we focus on turn-based sports and introduce a badminton environment by simulating rallies with different angles of view and designing the states, actions, and training procedures. This benefits not only coaches and players by simulating past matches for tactic investigation, but also researchers from rapidly evaluating their novel algorithms.
LGFeb 2, 2024Code
Root Cause Analysis In Microservice Using Neural Granger Causal DiscoveryCheng-Ming Lin, Ching Chang, Wei-Yao Wang et al.
In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to establish causal relationships and derive root causes from causal graphs. Nevertheless, they ignored the temporal order of time series data and failed to leverage the rich information inherent in the temporal relationships. For instance, in cases where there is a sudden spike in CPU utilization, it can lead to an increase in latency for other microservices. However, in this scenario, the anomaly in CPU utilization occurs before the latency increase, rather than simultaneously. As a result, the PC-algorithm fails to capture such characteristics. To address these challenges, we propose RUN, a novel approach for root cause analysis using neural Granger causal discovery with contrastive learning. RUN enhances the backbone encoder by integrating contextual information from time series, and leverages a time series forecasting model to conduct neural Granger causal discovery. In addition, RUN incorporates Pagerank with a personalization vector to efficiently recommend the top-k root causes. Extensive experiments conducted on the synthetic and real-world microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods. Moreover, we provide an analysis scenario for the sock-shop case to showcase the practicality and efficacy of RUN in microservice-based applications. Our code is publicly available at https://github.com/zmlin1998/RUN.
CLJul 17, 2024
MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language ModelsThao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu et al.
Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.
CLOct 15, 2023
RSVP: Customer Intent Detection via Agent Response Contrastive and Generative Pre-TrainingYu-Chien Tang, Wei-Yao Wang, An-Zi Yen et al.
The dialogue systems in customer services have been developed with neural models to provide users with precise answers and round-the-clock support in task-oriented conversations by detecting customer intents based on their utterances. Existing intent detection approaches have highly relied on adaptively pre-training language models with large-scale datasets, yet the predominant cost of data collection may hinder their superiority. In addition, they neglect the information within the conversational responses of the agents, which have a lower collection cost, but are significant to customer intent as agents must tailor their replies based on the customers' intent. In this paper, we propose RSVP, a self-supervised framework dedicated to task-oriented dialogues, which utilizes agent responses for pre-training in a two-stage manner. Specifically, we introduce two pre-training tasks to incorporate the relations of utterance-response pairs: 1) Response Retrieval by selecting a correct response from a batch of candidates, and 2) Response Generation by mimicking agents to generate the response to a given utterance. Our benchmark results for two real-world customer service datasets show that RSVP significantly outperforms the state-of-the-art baselines by 4.95% for accuracy, 3.4% for MRR@3, and 2.75% for MRR@5 on average. Extensive case studies are investigated to show the validity of incorporating agent responses into the pre-training stage.
LGDec 7, 2023Code
TimeDRL: Disentangled Representation Learning for Multivariate Time-SeriesChing Chang, Chiao-Tung Chan, Wei-Yao Wang et al.
Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductive bias (e.g., transformation-invariance). To tackle these challenges, we propose TimeDRL, a generic multivariate time-series representation learning framework with disentangled dual-level embeddings. TimeDRL is characterized by three novel features: (i) disentangled derivation of timestamp-level and instance-level embeddings from patched time-series data using a [CLS] token strategy; (ii) utilization of timestamp-predictive and instance-contrastive tasks for disentangled representation learning, with the former optimizing timestamp-level embeddings with predictive loss, and the latter optimizing instance-level embeddings with contrastive loss; and (iii) avoidance of augmentation methods to eliminate inductive biases, such as transformation-invariance from cropping and masking. Comprehensive experiments on 6 time-series forecasting datasets and 5 time-series classification datasets have shown that TimeDRL consistently surpasses existing representation learning approaches, achieving an average improvement of forecasting by 58.02% in MSE and classification by 1.48% in accuracy. Furthermore, extensive ablation studies confirmed the relative contribution of each component in TimeDRL's architecture, and semi-supervised learning evaluations demonstrated its effectiveness in real-world scenarios, even with limited labeled data. The code is available at https://github.com/blacksnail789521/TimeDRL.
SIDec 17, 2023Code
SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on TwitterYing-Ying Chang, Wei-Yao Wang, Wen-Chih Peng
In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions between users. To address these challenges, we proposed SeGA, preference-aware self-contrastive learning for anomalous user detection, which leverages heterogeneous entities and their relations in the Twittersphere to detect anomalous users with different malicious strategies. SeGA utilizes the knowledge of large language models to summarize user preferences via posts. In addition, integrating user preferences with prompts as pseudo-labels for preference-aware self-contrastive learning enables the model to learn multifaceted aspects for describing the behaviors of users. Extensive experiments on the proposed TwBNT benchmark demonstrate that SeGA significantly outperforms the state-of-the-art methods (+3.5\% ~ 27.6\%) and empirically validate the effectiveness of the model design and pre-training strategies. Our code and data are publicly available at https://github.com/ying0409/SeGA.
LGJun 12, 2025Code
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time SeriesChing Chang, Jeehyun Hwang, Yidan Shi et al.
Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions. The dataset is publicly available at https://github.com/blacksnail789521/Time-IMM, and the benchmark library can be accessed at https://github.com/blacksnail789521/IMM-TSF. Project page: https://blacksnail789521.github.io/time-imm-project-page/
IROct 29, 2022
Track2Vec: fairness music recommendation with a GPU-free customizable-driven frameworkWei-Wei Du, Wei-Yao Wang, Wen-Chih Peng
Recommendation systems have illustrated the significant progress made in characterizing users' preferences based on their past behaviors. Despite the effectiveness of recommending accurately, there exist several factors that are essential but unexplored for evaluating various facets of recommendation systems, e.g., fairness, diversity, and limited resources. To address these issues, we propose Track2Vec, a GPU-free customizable-driven framework for fairness music recommendation. In order to take both accuracy and fairness into account, our solution consists of three modules, a customized fairness-aware groups for modeling different features based on configurable settings, a track representation learning module for learning better user embedding, and an ensemble module for ranking the recommendation results from different track representation learning modules. Moreover, inspired by TF-IDF which has been widely used in natural language processing, we introduce a metric called Miss Rate - Inverse Ground Truth Frequency (MR-ITF) to measure the fairness. Extensive experiments demonstrate that our model achieves a 4th price ranking in a GPU-free environment on the leaderboard in the EvalRS @ CIKM 2022 challenge, which is superior to the official baseline by about 200% in terms of the official scores. In addition, the ablation study illustrates the necessity of ensembling each group to acquire both accurate and fair recommendations.
AINov 14, 2025
CURENet: Combining Unified Representations for Efficient Chronic Disease PredictionCong-Tinh Dao, Nguyen Minh Thao Phan, Jun-En Ding et al.
Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient's health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In this paper, we present CURENet, a multimodal model (Combining Unified Representations for Efficient chronic disease prediction) that integrates unstructured clinical notes, lab tests, and patients' time-series data by utilizing large language models (LLMs) for clinical text processing and textual lab tests, as well as transformer encoders for longitudinal sequential visits. CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses. We evaluated CURENet using the public MIMIC-III and private FEMH datasets, where it achieved over 94\% accuracy in predicting the top 10 chronic conditions in a multi-label framework. Our findings highlight the potential of multimodal EHR integration to enhance clinical decision-making and improve patient outcomes.
LGJun 12, 2025Code
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series SegmentationChing Chang, Ming-Chih Lo, Wen-Chih Peng et al.
Multivariate time series data, collected across various fields such as manufacturing and wearable technology, exhibit states at multiple levels of granularity, from coarse-grained system behaviors to fine-grained, detailed events. Effectively segmenting and integrating states across these different granularities is crucial for tasks like predictive maintenance and performance optimization. However, existing time series segmentation methods face two key challenges: (1) the inability to handle multiple levels of granularity within a unified model, and (2) limited adaptability to new, evolving patterns in dynamic environments. To address these challenges, we propose PromptTSS, a novel framework for time series segmentation with multi-granularity states. PromptTSS uses a unified model with a prompting mechanism that leverages label and boundary information to guide segmentation, capturing both coarse- and fine-grained patterns while adapting dynamically to unseen patterns. Experiments show PromptTSS improves accuracy by 24.49% in multi-granularity segmentation, 17.88% in single-granularity segmentation, and up to 599.24% in transfer learning, demonstrating its adaptability to hierarchical states and evolving time series dynamics. Our code is available at https://github.com/blacksnail789521/PromptTSS.
CLFeb 13, 2023
NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have Good Meme AnalysisYu-Chien Tang, Kuang-Da Wang, Ting-Yun Ou et al.
This paper presents a robust solution to the Memotion 3.0 Shared Task. The goal of this task is to classify the emotion and the corresponding intensity expressed by memes, which are usually in the form of images with short captions on social media. Understanding the multi-modal features of the given memes will be the key to solving the task. In this work, we use CLIP to extract aligned image-text features and propose a novel meme sentiment analysis framework, consisting of a Cooperative Teaching Model (CTM) for Task A and a Cascaded Emotion Classifier (CEC) for Tasks B&C. CTM is based on the idea of knowledge distillation, and can better predict the sentiment of a given meme in Task A; CEC can leverage the emotion intensity suggestion from the prediction of Task C to classify the emotion more precisely in Task B. Experiments show that we achieved the 2nd place ranking for both Task A and Task B and the 4th place ranking for Task C, with weighted F1-scores of 0.342, 0.784, and 0.535 respectively. The results show the robustness and effectiveness of our framework. Our code is released at github.
AISep 15, 2025Code
A Survey of Reasoning and Agentic Systems in Time Series with Large Language ModelsChing Chang, Yidan Shi, Defu Cao et al.
Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (https://github.com/blacksnail789521/Time-Series-Reasoning-Survey). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.
CVMar 13, 2025Code
Team NYCU at Defactify4: Robust Detection and Source Identification of AI-Generated Images Using CNN and CLIP-Based ModelsTsan-Tsung Yang, I-Wei Chen, Kuan-Ting Chen et al.
With the rapid advancement of generative AI, AI-generated images have become increasingly realistic, raising concerns about creativity, misinformation, and content authenticity. Detecting such images and identifying their source models has become a critical challenge in ensuring the integrity of digital media. This paper tackles the detection of AI-generated images and identifying their source models using CNN and CLIP-ViT classifiers. For the CNN-based classifier, we leverage EfficientNet-B0 as the backbone and feed with RGB channels, frequency features, and reconstruction errors, while for CLIP-ViT, we adopt a pretrained CLIP image encoder to extract image features and SVM to perform classification. Evaluated on the Defactify 4 dataset, our methods demonstrate strong performance in both tasks, with CLIP-ViT showing superior robustness to image perturbations. Compared to baselines like AEROBLADE and OCC-CLIP, our approach achieves competitive results. Notably, our method ranked Top-3 overall in the Defactify 4 competition, highlighting its effectiveness and generalizability. All of our implementations can be found in https://github.com/uuugaga/Defactify_4
LGDec 14, 2024Code
APAR: Modeling Irregular Target Functions in Tabular Regression via Arithmetic-Aware Pre-Training and Adaptive-Regularized Fine-TuningHong-Wei Wu, Wei-Yao Wang, Kuang-Da Wang et al.
Tabular data are fundamental in common machine learning applications, ranging from finance to genomics and healthcare. This paper focuses on tabular regression tasks, a field where deep learning (DL) methods are not consistently superior to machine learning (ML) models due to the challenges posed by irregular target functions inherent in tabular data, causing sensitive label changes with minor variations from features. To address these issues, we propose a novel Arithmetic-Aware Pre-training and Adaptive-Regularized Fine-tuning framework (APAR), which enables the model to fit irregular target function in tabular data while reducing the negative impact of overfitting. In the pre-training phase, APAR introduces an arithmetic-aware pretext objective to capture intricate sample-wise relationships from the perspective of continuous labels. In the fine-tuning phase, a consistency-based adaptive regularization technique is proposed to self-learn appropriate data augmentation. Extensive experiments across 10 datasets demonstrated that APAR outperforms existing GBDT-, supervised NN-, and pretrain-finetune NN-based methods in RMSE (+9.43% $\sim$ 20.37%), and empirically validated the effects of pre-training tasks, including the study of arithmetic operations. Our code and data are publicly available at https://github.com/johnnyhwu/APAR.
LGJan 18, 2023
Detecting and Ranking Causal Anomalies in End-to-End Complex SystemChing Chang, Wen-Chih Peng
With the rapid development of technology, the automated monitoring systems of large-scale factories are becoming more and more important. By collecting a large amount of machine sensor data, we can have many ways to find anomalies. We believe that the real core value of an automated monitoring system is to identify and track the cause of the problem. The most famous method for finding causal anomalies is RCA, but there are many problems that cannot be ignored. They used the AutoRegressive eXogenous (ARX) model to create a time-invariant correlation network as a machine profile, and then use this profile to track the causal anomalies by means of a method called fault propagation. There are two major problems in describing the behavior of a machine by using the correlation network established by ARX: (1) It does not take into account the diversity of states (2) It does not separately consider the correlations with different time-lag. Based on these problems, we propose a framework called Ranking Causal Anomalies in End-to-End System (RCAE2E), which completely solves the problems mentioned above. In the experimental part, we use synthetic data and real-world large-scale photoelectric factory data to verify the correctness and existence of our method hypothesis.
CLJun 26, 2024Code
BADGE: BADminton report Generation and Evaluation with LLMShang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang et al.
Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We introduce a novel framework named BADGE, designed for this purpose using LLM. Our method consists of two main phases: Report Generation and Report Evaluation. Initially, badminton-related data is processed by the LLM, which then generates a detailed report of the match. We tested different Input Data Types, In-Context Learning (ICL), and LLM, finding that GPT-4 performs best when using CSV data type and the Chain of Thought prompting. Following report generation, the LLM evaluates and scores the reports to assess their quality. Our comparisons between the scores evaluated by GPT-4 and human judges show a tendency to prefer GPT-4 generated reports. Since the application of LLM in badminton reporting remains largely unexplored, our research serves as a foundational step for future advancements in this area. Moreover, our method can be extended to other sports games, thereby enhancing sports promotion. For more details, please refer to https://github.com/AndyChiangSH/BADGE.
LGSep 2, 2023Code
DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate AppraisalWei-Wei Du, Wei-Yao Wang, Wen-Chih Peng
The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
CVJan 26, 2022Code
Team Yao at Factify 2022: Utilizing Pre-trained Models and Co-attention Networks for Multi-Modal Fact VerificationWei-Yao Wang, Wen-Chih Peng
In recent years, social media has enabled users to get exposed to a myriad of misinformation and disinformation; thus, misinformation has attracted a great deal of attention in research fields and as a social issue. To address the problem, we propose a framework, Pre-CoFact, composed of two pre-trained models for extracting features from text and images, and multiple co-attention networks for fusing the same modality but different sources and different modalities. Besides, we adopt the ensemble method by using different pre-trained models in Pre-CoFact to achieve better performance. We further illustrate the effectiveness from the ablation study and examine different pre-trained models for comparison. Our team, Yao, won the fifth prize (F1-score: 74.585\%) in the Factify challenge hosted by De-Factify @ AAAI 2022, which demonstrates that our model achieved competitive performance without using auxiliary tasks or extra information. The source code of our work is publicly available at https://github.com/wywyWang/Multi-Modal-Fact-Verification-2021
LGSep 14, 2021Code
Exploring the Long Short-Term Dependencies to Infer Shot Influence in Badminton MatchesWei-Yao Wang, Teng-Fong Chan, Hui-Kuo Yang et al.
Identifying significant shots in a rally is important for evaluating players' performance in badminton matches. While there are several studies that have quantified player performance in other sports, analyzing badminton data is remained untouched. In this paper, we introduce a badminton language to fully describe the process of the shot and propose a deep learning model composed of a novel short-term extractor and a long-term encoder for capturing a shot-by-shot sequence in a badminton rally by framing the problem as predicting a rally result. Our model incorporates an attention mechanism to enable the transparency of the action sequence to the rally result, which is essential for badminton experts to gain interpretable predictions. Experimental evaluation based on a real-world dataset demonstrates that our proposed model outperforms the strong baselines. The source code is publicly available at https://github.com/yao0510/Shot-Influence.
LGFeb 2, 2024
A Survey on Self-Supervised Learning for Non-Sequential Tabular DataWei-Yao Wang, Wei-Wei Du, Derek Xu et al.
Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has become a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups - predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods in each direction. Moreover, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to analyze the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain, and of improving the foundations for implicit tabular data.
CLApr 29
Tree-of-Text: A Tree-based Prompting Framework for Table-to-Text Generation in the Sports DomainShang-Hsuan Chiang, Tsan-Tsung Yang, An-Zi Yen et al.
Generating sports game reports from structured tables is a complex table-to-text task that demands both precise data interpretation and fluent narrative generation. Traditional model-based approaches require large, annotated datasets, while prompt-based methods using large language models (LLMs) often struggle with hallucination due to weak table comprehension. To overcome these challenges, we propose Tree-of-Text, a tree-structured prompting framework that guides LLMs through a three-stage generation process: (1) Content Planning, where relevant operations and arguments are selected from the input tables; (2) Operation Execution, which breaks down large tables into manageable sub-tables; and (3) Content Generation, where short textual outputs are merged and rewritten into a cohesive report. Experiments show that our method outperforms existing methods on ShuttleSet+, leads in RG and CO metrics on RotoWire-FG, and excels in CS and CO on MLB with roughly 40% of the time and cost of Chain-of-Table. These results demonstrate the effectiveness and efficiency of Tree-of-Text and suggest a promising direction for prompt-based table-to-text generation in the sports domain.
CLMar 23, 2025
MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan GenerationHsin-Ling Hsu, Cong-Tinh Dao, Luning Wang et al.
Despite recent success in applying large language models (LLMs) to electronic health records (EHR), most systems focus primarily on assessment rather than treatment planning. We identify three critical limitations in current approaches: they generate treatment plans in a single pass rather than following the sequential reasoning process used by clinicians; they rarely incorporate patient-specific historical context; and they fail to effectively distinguish between subjective and objective clinical information. Motivated by the SOAP methodology (Subjective, Objective, Assessment, Plan), we introduce \ours{}, a novel framework that structures LLM reasoning to align with real-life clinician workflows. Our approach employs a two-stage architecture that first generates a clinical assessment based on patient symptoms and objective data, then formulates a structured treatment plan informed by this assessment and enriched with patient-specific information through retrieval-augmented generation. Comprehensive evaluation demonstrates that our method significantly outperforms baseline approaches in both assessment accuracy and treatment plan quality.
CLMar 2, 2024
Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR DataJun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng et al.
Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction.
CLMar 15, 2024
Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-TuningShang-Hsuan Chiang, Ming-Chih Lo, Lin-Wei Chao et al.
In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification. Leveraging In-Context Learning, Fine-tuned Large Language Models (LLMs), and the FakeNet model, we address the challenges of fact verification. Our experiments explore diverse approaches, comparing different Pre-trained LLMs, introducing FakeNet, and implementing various ensemble methods. Notably, our team, Trifecta, secured first place in the AAAI-24 Factify 3.0 Workshop, surpassing the baseline accuracy by 103% and maintaining a 70% lead over the second competitor. This success underscores the efficacy of our approach and its potential contributions to advancing fact verification research.
CLJan 27, 2024
Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News DetectionWei-Yao Wang, Yu-Chieh Chang, Wen-Chih Peng
With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading fallaciously from malicious social media (e.g., content farms), we propose a novel verification framework, Style-News, using publisher metadata to imply a publisher's template with the corresponding text types, political stance, and credibility. Based on threat modeling aspects, a style-aware neural news generator is introduced as an adversary for generating news content conditioning for a specific publisher, and style and source discriminators are trained to defend against this attack by identifying which publisher the style corresponds with, and discriminating whether the source of the given news is human-written or machine-generated. To evaluate the quality of the generated content, we integrate various dimensional metrics (language fluency, content preservation, and style adherence) and demonstrate that Style-News significantly outperforms the previous approaches by a margin of 0.35 for fluency, 15.24 for content, and 0.38 for style at most. Moreover, our discriminative model outperforms state-of-the-art baselines in terms of publisher prediction (up to 4.64%) and neural fake news detection (+6.94% $\sim$ 31.72%).
LGNov 1, 2024
Text2Freq: Learning Series Patterns from Text via Frequency DomainMing-Chih Lo, Ching Chang, Wen-Chih Peng
Traditional time series forecasting models mainly rely on historical numeric values to predict future outcomes.While these models have shown promising results, they often overlook the rich information available in other modalities, such as textual descriptions of special events, which can provide crucial insights into future dynamics.However, research that jointly incorporates text in time series forecasting remains relatively underexplored compared to other cross-modality work. Additionally, the modality gap between time series data and textual information poses a challenge for multimodal learning. To address this task, we propose Text2Freq, a cross-modality model that integrates text and time series data via the frequency domain. Specifically, our approach aligns textual information to the low-frequency components of time series data, establishing more effective and interpretable alignments between these two modalities. Our experiments on paired datasets of real-world stock prices and synthetic texts show that Text2Freq achieves state-of-the-art performance, with its adaptable architecture encouraging future research in this field.
CLSep 21, 2025
Extending Automatic Machine Translation Evaluation to Book-Length DocumentsKuang-Da Wang, Shuoyang Ding, Chao-Han Huck Yang et al.
Despite Large Language Models (LLMs) demonstrating superior translation performance and long-context capabilities, evaluation methodologies remain constrained to sentence-level assessment due to dataset limitations, token number restrictions in metrics, and rigid sentence boundary requirements. We introduce SEGALE, an evaluation scheme that extends existing automatic metrics to long-document translation by treating documents as continuous text and applying sentence segmentation and alignment methods. Our approach enables previously unattainable document-level evaluation, handling translations of arbitrary length generated with document-level prompts while accounting for under-/over-translations and varied sentence boundaries. Experiments show our scheme significantly outperforms existing long-form document evaluation schemes, while being comparable to evaluations performed with groundtruth sentence alignments. Additionally, we apply our scheme to book-length texts and newly demonstrate that many open-weight LLMs fail to effectively translate documents at their reported maximum context lengths.
LGOct 16, 2024
Self-Supervised Learning of Disentangled Representations for Multivariate Time-SeriesChing Chang, Chiao-Tung Chan, Wei-Yao Wang et al.
Multivariate time-series data in fields like healthcare and industry are informative but challenging due to high dimensionality and lack of labels. Recent self-supervised learning methods excel in learning rich representations without labels but struggle with disentangled embeddings and inductive bias issues like transformation-invariance. To address these challenges, we introduce TimeDRL, a framework for multivariate time-series representation learning with dual-level disentangled embeddings. TimeDRL features: (i) disentangled timestamp-level and instance-level embeddings using a [CLS] token strategy; (ii) timestamp-predictive and instance-contrastive tasks for representation learning; and (iii) avoidance of augmentation methods to eliminate inductive biases. Experiments on forecasting and classification datasets show TimeDRL outperforms existing methods, with further validation in semi-supervised settings with limited labeled data.
LGOct 11, 2025
MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State SegmentationChing Chang, Ming-Chih Lo, Chiao-Tung Chan et al.
Web platforms, mobile applications, and connected sensing systems generate multivariate time series with states at multiple levels of granularity, from coarse regimes to fine-grained events. Effective segmentation in these settings requires integrating across granularities while supporting iterative refinement through sparse prompt signals, which provide a compact mechanism for injecting domain knowledge. Yet existing prompting approaches for time series segmentation operate only within local contexts, so the effect of a prompt quickly fades and cannot guide predictions across the entire sequence. To overcome this limitation, we propose MemPromptTSS, a framework for iterative multi-granularity segmentation that introduces persistent prompt memory. A memory encoder transforms prompts and their surrounding subsequences into memory tokens stored in a bank. This persistent memory enables each new prediction to condition not only on local cues but also on all prompts accumulated across iterations, ensuring their influence persists across the entire sequence. Experiments on six datasets covering wearable sensing and industrial monitoring show that MemPromptTSS achieves 23% and 85% accuracy improvements over the best baseline in single- and multi-granularity segmentation under single iteration inference, and provides stronger refinement in iterative inference with average per-iteration gains of 2.66 percentage points compared to 1.19 for PromptTSS. These results highlight the importance of persistent memory for prompt-guided segmentation, establishing MemPromptTSS as a practical and effective framework for real-world applications.
CLOct 8, 2025
CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language ModelsYong-En Tian, Yu-Chien Tang, An-Zi Yen et al.
Aspect-based summarization has attracted significant attention for its ability to generate more fine-grained and user-aligned summaries. While most existing approaches assume a set of predefined aspects as input, real-world scenarios often present challenges where these given aspects may be incomplete, irrelevant, or entirely missing from the document. Users frequently expect systems to adaptively refine or filter the provided aspects based on the actual content. In this paper, we initiate this novel task setting, termed Content-Aware Refinement of Provided Aspects for Summarization (CARPAS), with the aim of dynamically adjusting the provided aspects based on the document context before summarizing. We construct three new datasets to facilitate our pilot experiments, and by using LLMs with four representative prompting strategies in this task, we find that LLMs tend to predict an overly comprehensive set of aspects, which often results in excessively long and misaligned summaries. Building on this observation, we propose a preliminary subtask to predict the number of relevant aspects, and demonstrate that the predicted number can serve as effective guidance for the LLMs, reducing the inference difficulty, and enabling them to focus on the most pertinent aspects. Our extensive experiments show that the proposed approach significantly improves performance across all datasets. Moreover, our deeper analyses uncover LLMs' compliance when the requested number of aspects differs from their own estimations, establishing a crucial insight for the deployment of LLMs in similar real-world applications.
AIAug 30, 2025
NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting TasksYen-Che Chien, Kuang-Da Wang, Wei-Yao Wang et al.
Recent advances in autonomous digital agents from industry (e.g., Manus AI and Gemini's research mode) highlight potential for structured tasks by autonomous decision-making and task decomposition; however, it remains unclear to what extent the agent-based systems can improve multimodal web data productivity. We study this in the realm of journalism, which requires iterative planning, interpretation, and contextual reasoning from multimodal raw contents to form a well structured news. We introduce NEWSAGENT, a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article by accessing core journalistic functions. Given a writing instruction and firsthand data as how a journalist initiates a news draft, agents are tasked to identify narrative perspectives, issue keyword-based queries, retrieve historical background, and generate complete articles. Unlike typical summarization or retrieval tasks, essential context is not directly available and must be actively discovered, reflecting the information gaps faced in real-world news writing. NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility. We evaluate open- and closed-sourced LLMs with commonly-used agentic frameworks on NEWSAGENT, which shows that agents are capable of retrieving relevant facts but struggling with planning and narrative integration. We believe that NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.
LGJun 23, 2025
DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System ModelingYang Chang, Kuang-Da Wang, Ping-Chun Hsieh et al.
Uncovering the underlying ordinary differential equations (ODEs) that govern dynamic systems is crucial for advancing our understanding of complex phenomena. Traditional symbolic regression methods often struggle to capture the temporal dynamics and intervariable correlations inherent in ODEs. ODEFormer, a state-of-the-art method for inferring multidimensional ODEs from single trajectories, has made notable progress. However, its focus on single-trajectory evaluation is highly sensitive to initial starting points, which may not fully reflect true performance. To address this, we propose the divergence difference metric (DIV-diff), which evaluates divergence over a grid of points within the target region, offering a comprehensive and stable analysis of the variable space. Alongside, we introduce DDOT (Derivative-Directed Dual-Decoder Ordinary Differential Equation Transformer), a transformer-based model designed to reconstruct multidimensional ODEs in symbolic form. By incorporating an auxiliary task predicting the ODE's derivative, DDOT effectively captures both structure and dynamic behavior. Experiments on ODEBench show DDOT outperforms existing symbolic regression methods, achieving an absolute improvement of 4.58% and 1.62% in $P(R^2 > 0.9)$ for reconstruction and generalization tasks, respectively, and an absolute reduction of 3.55% in DIV-diff. Furthermore, DDOT demonstrates real-world applicability on an anesthesia dataset, highlighting its practical impact.
LGJun 8, 2025
Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced RegressionYung-Chien Wang, Kuang-Da Wang, Wei-Yao Wang et al.
Tabular data serve as a fundamental and ubiquitous representation of structured information in numerous real-world applications, e.g., finance and urban planning. In the realm of tabular imbalanced applications, data imbalance has been investigated in classification tasks with insufficient instances in certain labels, causing the model's ineffective generalizability. However, the imbalance issue of tabular regression tasks is underexplored, and yet is critical due to unclear boundaries for continuous labels and simplifying assumptions in existing imbalance regression work, which often rely on known and balanced test distributions. Such assumptions may not hold in practice and can lead to performance degradation. To address these issues, we propose MATI: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalance Regression, featuring two key innovations: (i) the Region-Aware Mixture Expert, which adopts a Gaussian Mixture Model to capture the underlying related regions. The statistical information of each Gaussian component is then used to synthesize and train region-specific experts to capture the unique characteristics of their respective regions. (ii) Test-Time Self-Supervised Expert Aggregation, which dynamically adjusts region expert weights based on test data features to reinforce expert adaptation across varying test distributions. We evaluated MATI on four real-world tabular imbalance regression datasets, including house pricing, bike sharing, and age prediction. To reflect realistic deployment scenarios, we adopted three types of test distributions: a balanced distribution with uniform target frequencies, a normal distribution that follows the training data, and an inverse distribution that emphasizes rare target regions. On average across these three test distributions, MATI achieved a 7.1% improvement in MAE compared to existing methods.
LGMar 16, 2025
EXPRESS: An LLM-Generated Explainable Property Valuation System with Neighbor ImputationWei-Wei Du, Yung-Chien Wang, Wen-Chih Peng
The demand for property valuation has attracted significant attention from sellers, buyers, and customers applying for loans. Reviews of existing approaches have revealed shortcomings in terms of not being able to handle missing value situations, as well as lacking interpretability, which means they cannot be used in real-world applications. To address these challenges, we propose an LLM-Generated EXplainable PRopErty valuation SyStem with neighbor imputation called EXPRESS, which provides the customizable missing value imputation technique, and addresses the opaqueness of prediction by providing the feature-wise explanation generated by LLM. The dynamic nearest neighbor search finds similar properties depending on different application scenarios by property configuration set by users (e.g., house age as criteria for the house in rural areas, and locations for buildings in urban areas). Motivated by the human appraisal procedure, we generate feature-wise explanations to provide users with a more intuitive understanding of the prediction results.
AIMar 11, 2025
Imitation Learning of Correlated Policies in Stackelberg GamesKuang-Da Wang, Ping-Chun Hsieh, Wen-Chih Peng
Stackelberg games, widely applied in domains like economics and security, involve asymmetric interactions where a leader's strategy drives follower responses. Accurately modeling these dynamics allows domain experts to optimize strategies in interactive scenarios, such as turn-based sports like badminton. In multi-agent systems, agent behaviors are interdependent, and traditional Multi-Agent Imitation Learning (MAIL) methods often fail to capture these complex interactions. Correlated policies, which account for opponents' strategies, are essential for accurately modeling such dynamics. However, even methods designed for learning correlated policies, like CoDAIL, struggle in Stackelberg games due to their asymmetric decision-making, where leaders and followers cannot simultaneously account for each other's actions, often leading to non-correlated policies. Furthermore, existing MAIL methods that match occupancy measures or use adversarial techniques like GAIL or Inverse RL face scalability challenges, particularly in high-dimensional environments, and suffer from unstable training. To address these challenges, we propose a correlated policy occupancy measure specifically designed for Stackelberg games and introduce the Latent Stackelberg Differential Network (LSDN) to match it. LSDN models two-agent interactions as shared latent state trajectories and uses multi-output Geometric Brownian Motion (MO-GBM) to effectively capture joint policies. By leveraging MO-GBM, LSDN disentangles environmental influences from agent-driven transitions in latent space, enabling the simultaneous learning of interdependent policies. This design eliminates the need for adversarial training and simplifies the learning process. Extensive experiments on Iterative Matrix Games and multi-agent particle environments demonstrate that LSDN can better reproduce complex interaction dynamics than existing MAIL methods.
CLFeb 28, 2025
Test-Time Alignment for Large Language Models via Textual Model Predictive ControlKuang-Da Wang, Teng-Ruei Chen, Yu Heng Hung et al.
Aligning Large Language Models (LLMs) with human preferences through finetuning is resource-intensive, motivating lightweight alternatives at test time. We address test-time alignment through the lens of sequential decision making, a perspective that reveals two fundamental challenges. When actions are defined at the token level, as in guided decoding, alignment suffers from the curse of horizon. Conversely, when actions are at the response level, as in traditional iterative refinement, the curse of dimensionality emerges. To resolve this trade-off, we draw inspiration from Model Predictive Control (MPC) in control theory to propose Textual Model Predictive Control (TMPC), a novel predictive planning framework adapted for aligning LLMs at inference time. A key limitation of standard MPC is its reliance on predefined, hard segment boundaries, which are often absent in text generation. TMPC overcomes this by introducing two principles inspired by hierarchical reinforcement learning: (1) Hindsight Subgoal Identification, where TMPC analyzes generation subgoals to retrospectively identify high-reward intermediate outputs as subgoals. This allows the framework to discover meaningful, task-specific planning steps (e.g., a sentence in machine translation or a bug fix in code generation.). (2) Subgoal-Conditioned Re-Generation, where these identified subgoals are used to guide subsequent planning iterations. By conditioning on these proven, high-quality subgoals, TMPC ensures stable improvement by building upon previously validated successes. TMPC is evaluated on three tasks with distinct segmentation properties: discourse-level translation, long-form response generation, and program synthesis. The results demonstrate that TMPC consistently improves performance, highlighting the generality.
AIMar 19, 2024
Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian MotionKuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh et al.
In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained underexplored. Replicating opponents' behavior benefits players by allowing them to undergo strategic development with direction before matches. However, directly applying existing methods suffers from the inherent hierarchy of the match and the compounding effect due to the turn-based nature of players alternatively taking actions. In this paper, we propose RallyNet, a novel hierarchical offline imitation learning model for badminton player behaviors: (i) RallyNet captures players' decision dependencies by modeling decision-making processes as a contextual Markov decision process. (ii) RallyNet leverages the experience to generate context as the agent's intent in the rally. (iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors. In this manner, RallyNet links player intents with interaction models with GBM, providing an understanding of interactions for sports analytics. We extensively validate RallyNet with the largest available real-world badminton dataset consisting of men's and women's singles, demonstrating its ability to imitate player behaviors. Results reveal RallyNet's superiority over offline imitation learning methods and state-of-the-art turn-based approaches, outperforming them by at least 16% in mean rule-based agent normalization score. Furthermore, we discuss various practical use cases to highlight RallyNet's applicability.
AIDec 18, 2023
ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing Forecasting Models in BadmintonWei-Yao Wang, Wen-Chih Peng, Wei Wang et al.
Agent forecasting systems have been explored to investigate agent patterns and improve decision-making in various domains, e.g., pedestrian predictions and marketing bidding. Badminton represents a fascinating example of a multifaceted turn-based sport, requiring both sophisticated tactic developments and alternate-dependent decision-making. Recent deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions. However, a critical obstacle lies in the unclear functionality of which features are learned for simulating players' behaviors by black-box models, where existing explainers are not equipped with turn-based and multi-output attributions. To bridge this gap, we propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values. ShuttleSHAP is a model-agnostic explainer that aims to quantify contribution by not only temporal aspects but also player aspects in terms of multifaceted cues. Incorporating the proposed analysis tool into the state-of-the-art turn-based forecasting model on the benchmark dataset reveals that it is, in fact, insignificant to reason about past strokes, while conventional sequential models have greater impacts. Instead, players' styles influence the models for the future simulation of a rally. On top of that, we investigate and discuss the causal analysis of these findings and demonstrate the practicability with local analysis.
LGDec 30, 2021
A Lightweight and Accurate Spatial-Temporal Transformer for Traffic ForecastingGuanyao Li, Shuhan Zhong, S. -H. Gary Chan et al.
We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computationally intensive in training with a large number of hyper-parameters to tune. We propose ST-TIS, a novel, lightweight, and accurate Spatial-Temporal Transformer with information fusion and region sampling for traffic forecasting. ST-TIS extends the canonical Transformer with information fusion and region sampling. The information fusion module captures the complex spatial-temporal dependency between regions. The region sampling module is to improve the efficiency and prediction accuracy, cutting the computation complexity for dependency learning from $O(n^2)$ to $O(n\sqrt{n})$, where n is the number of regions. With far fewer parameters than state-of-the-art models, the offline training of our model is significantly faster in terms of tuning and computation (with a reduction of up to $90\%$ on training time and network parameters). Notwithstanding such training efficiency, extensive experiments show that ST-TIS is substantially more accurate in online prediction than state-of-the-art approaches (with an average improvement of up to $9.5\%$ on RMSE, and $12.4\%$ on MAPE).
LGDec 2, 2021
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in BadmintonWei-Yao Wang, Hong-Han Shuai, Kai-Shiang Chang et al.
The increasing demand for analyzing the insights in sports has stimulated a line of productive studies from a variety of perspectives, e.g., health state monitoring, outcome prediction. In this paper, we focus on objectively judging what and where to return strokes, which is still unexplored in turn-based sports. By formulating stroke forecasting as a sequence prediction task, existing works can tackle the problem but fail to model information based on the characteristics of badminton. To address these limitations, we propose a novel Position-aware Fusion of Rally Progress and Player Styles framework (ShuttleNet) that incorporates rally progress and information of the players by two modified encoder-decoder extractors. Moreover, we design a fusion network to integrate rally contexts and contexts of the players by conditioning on information dependency and different positions. Extensive experiments on the badminton dataset demonstrate that ShuttleNet significantly outperforms the state-of-the-art methods and also empirically validates the feasibility of each component in ShuttleNet. On top of that, we provide an analysis scenario for the stroke forecasting problem.
LGJul 9, 2020
Probabilistic Value Selection for Space Efficient ModelGunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu et al.
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.
CLFeb 6, 2020
Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline GenerationYun-Zhu Song, Hong-Han Shuai, Sung-Lin Yeh et al.
With the rapid proliferation of online media sources and published news, headlines have become increasingly important for attracting readers to news articles, since users may be overwhelmed with the massive information. In this paper, we generate inspired headlines that preserve the nature of news articles and catch the eye of the reader simultaneously. The task of inspired headline generation can be viewed as a specific form of Headline Generation (HG) task, with the emphasis on creating an attractive headline from a given news article. To generate inspired headlines, we propose a novel framework called POpularity-Reinforced Learning for inspired Headline Generation (PORL-HG). PORL-HG exploits the extractive-abstractive architecture with 1) Popular Topic Attention (PTA) for guiding the extractor to select the attractive sentence from the article and 2) a popularity predictor for guiding the abstractor to rewrite the attractive sentence. Moreover, since the sentence selection of the extractor is not differentiable, techniques of reinforcement learning (RL) are utilized to bridge the gap with rewards obtained from a popularity score predictor. Through quantitative and qualitative experiments, we show that the proposed PORL-HG significantly outperforms the state-of-the-art headline generation models in terms of attractiveness evaluated by both human (71.03%) and the predictor (at least 27.60%), while the faithfulness of PORL-HG is also comparable to the state-of-the-art generation model.
LGNov 7, 2019
Sequence-Aware Factorization Machines for Temporal Predictive AnalyticsTong Chen, Hongzhi Yin, Quoc Viet Hung Nguyen et al.
In various web applications like targeted advertising and recommender systems, the available categorical features (e.g., product type) are often of great importance but sparse. As a widely adopted solution, models based on Factorization Machines (FMs) are capable of modelling high-order interactions among features for effective sparse predictive analytics. As the volume of web-scale data grows exponentially over time, sparse predictive analytics inevitably involves dynamic and sequential features. However, existing FM-based models assume no temporal orders in the data, and are unable to capture the sequential dependencies or patterns within the dynamic features, impeding the performance and adaptivity of these methods. Hence, in this paper, we propose a novel Sequence-Aware Factorization Machine (SeqFM) for temporal predictive analytics, which models feature interactions by fully investigating the effect of sequential dependencies. As static features (e.g., user gender) and dynamic features (e.g., user interacted items) express different semantics, we innovatively devise a multi-view self-attention scheme that separately models the effect of static features, dynamic features and the mutual interactions between static and dynamic features in three different views. In SeqFM, we further map the learned representations of feature interactions to the desired output with a shared residual network. To showcase the versatility and generalizability of SeqFM, we test SeqFM in three popular application scenarios for FM-based models, namely ranking, classification and regression tasks. Extensive experimental results on six large-scale datasets demonstrate the superior effectiveness and efficiency of SeqFM.