CLIRMay 17, 2025

Recursive Question Understanding for Complex Question Answering over Heterogeneous Personal Data

arXiv:2505.11900v13 citationsh-index: 8ACL
Originality Incremental advance
AI Analysis

This addresses the challenge of providing convenient, on-device access to personal data for users, though it appears incremental as it builds on existing verbalization and encoding approaches.

The paper tackles the problem of answering complex questions over heterogeneous personal data, such as text and tables, by introducing ReQAP, a method that creates executable operator trees via recursive decomposition, resulting in traceable answers, and releases the PerQA benchmark for evaluation.

Question answering over mixed sources, like text and tables, has been advanced by verbalizing all contents and encoding it with a language model. A prominent case of such heterogeneous data is personal information: user devices log vast amounts of data every day, such as calendar entries, workout statistics, shopping records, streaming history, and more. Information needs range from simple look-ups to queries of analytical nature. The challenge is to provide humans with convenient access with small footprint, so that all personal data stays on the user devices. We present ReQAP, a novel method that creates an executable operator tree for a given question, via recursive decomposition. Operators are designed to enable seamless integration of structured and unstructured sources, and the execution of the operator tree yields a traceable answer. We further release the PerQA benchmark, with persona-based data and questions, covering a diverse spectrum of realistic user needs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes