CLJul 1, 2024Code
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable ApproachesJiayi Yuan, Hongyi Liu, Shaochen Zhong et al.
Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches - such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures - have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights - as well as a friendly workbench - for the future development of long context-capable LLMs. The source code is available at https://github.com/henryzhongsc/longctx_bench.
LGJun 20, 2024
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series ForecastingYu-Neng Chuang, Songchen Li, Jiayi Yuan et al.
Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired by the success of Large Language Models (LLMs), researchers are now developing Large Time Series Models (LTSMs)-universal transformer-based models that use autoregressive prediction-to improve TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities. However, these design choices are typically studied and evaluated in isolation and are not benchmarked collectively. In this work, we introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs, spanning pre-processing techniques, model configurations, and dataset configuration. It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, training paradigms, base model selection, data quantity, and dataset diversity. Furthermore, we combine the most effective design choices identified in our study. Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods on benchmark datasets.