AI CLSep 15, 2025

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Danielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, Anatoly Efros

arXiv:2509.12423v111.13 citationsh-index: 61EMNLP

Originality Incremental advance

AI Analysis

This addresses the need for privacy-preserving, low-cost intent understanding in intelligent agents, though it appears incremental as it builds on existing decomposition and fine-tuning techniques.

The paper tackles the problem of accurate intent extraction from UI interaction trajectories for on-device models by introducing a decomposed approach with structured summarization and fine-tuned extraction, achieving performance that surpasses large multi-modal language models.

Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.

View on arXiv PDF

Similar