Collaborative LLM Numerical Reasoning with Local Data Protection
This addresses data privacy concerns for users of low-capacity local models in numerical reasoning tasks, offering an incremental improvement over prior mitigation approaches.
The paper tackles the challenge of numerical reasoning on computation-constrained devices by proposing a collaborative framework between local and remote models, achieving improvements in accuracy by 16.2% - 43.6% and reducing data leakage by 2.3% - 44.6% compared to existing methods.
Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query topics while preserving reasoning patterns; and (2) a tool-based answer reconstruction approach that reuses the remote-generated plug-and-play solution with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2% - 43.6% while reducing data leakage by 2.3% - 44.6% compared to existing data protection approaches.