Inferring Events from Time Series using Language Models
This addresses the problem of interpreting time series data for decision-making in domains like finance and healthcare, but it is incremental as it applies existing methods to a new task.
The study investigated whether Large Language Models (LLMs) can infer natural language events from time series data, finding that several models, including OpenAI's o1 and DS-R1-distill-Qwen-32B, showed promising abilities, with post-training optimizations significantly improving Qwen2.5 1.5B to achieve results second only to o1.
Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. A common goal in analyzing time series data is to understand the underlying events that cause the observed variations. We conduct the first study of whether Large Language Models (LLMs) can infer events described with natural language from time series data. We evaluate 18 LLMs on a task to match event sequences with real-valued time series data using a new benchmark we develop using sports data. Several current LLMs demonstrate promising abilities, with OpenAI's o1 performing the best but with DS-R1-distill-Qwen-32B outperforming proprietary models such as GPT-4o. From insights derived from analyzing reasoning failures, we also find clear avenues to improve performance. By applying post-training optimizations, i.e., distillation and self-improvement, we significantly enhance the performance of the Qwen2.5 1.5B, achieving results second only to o1. All resources needed to reproduce our work are available: https://github.com/BennyTMT/GAMETime