IRAIAug 25, 2025

HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data

arXiv:2508.18048v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This provides a scalable and accurate solution for real-world recommendation systems handling complex user queries, though it appears incremental as it combines existing techniques (LLMs and embeddings) in a novel way.

The paper tackles the problem of retrieving information from semi-structured tabular data when user queries mix structured constraints and unstructured preferences, introducing HyST, a hybrid retrieval framework that combines LLM-powered structured filtering with semantic embedding search. Experiments on a semi-structured benchmark show HyST consistently outperforms traditional baselines, improving retrieval precision.

User queries in real-world recommendation systems often combine structured constraints (e.g., category, attributes) with unstructured preferences (e.g., product descriptions or reviews). We introduce HyST (Hybrid retrieval over Semi-structured Tabular data), a hybrid retrieval framework that combines LLM-powered structured filtering with semantic embedding search to support complex information needs over semi-structured tabular data. HyST extracts attribute-level constraints from natural language using large language models (LLMs) and applies them as metadata filters, while processing the remaining unstructured query components via embedding-based retrieval. Experiments on a semi-structured benchmark show that HyST consistently outperforms tradtional baselines, highlighting the importance of structured filtering in improving retrieval precision, offering a scalable and accurate solution for real-world user queries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes