CLFeb 28

LaSTR: Language-Driven Time-Series Segment Retrieval

Kota Dohi, Harsh Purohit, Tomoya Nishida, Takashi Endo, Yusuke Ohtsubo, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki, Yohei Kawaguchi
arXiv:2603.00725v1
Originality Incremental advance
AI Analysis

This addresses the challenge of effectively searching time-series data for system analysis, which is incremental as it builds on existing methods like CLIP with a novel application to time-series segmentation.

The paper tackles the problem of retrieving relevant local segments from large time-series repositories using natural language queries, and the result shows that LaSTR outperforms random and CLIP baselines across all settings, yielding improved ranking quality and stronger semantic agreement.

Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes