Borislav Pavlov

AI
h-index7
3papers
27citations
Novelty47%
AI Score44

3 Papers

46.9HCMar 27
Routine Computing: A Systematic Review of Sensing Daily Life Dimensions Towards Human-Centered Goals

Borislav Pavlov, Jiajin Li, Jun Fang et al.

Human routines structure daily life, yet remain challenging for computational systems to understand. This paper presents the first systematic review of routine computing, a previously implicit but increasingly recognized field that focuses on computationally sensing and modeling human behaviors. It synthesizes 203 studies published up to August 2025. The paper presents a new taxonomy of the literature, focusing on temporal structures, behavioral interactions, cognitive aspects, and how variability and deviations are addressed. The common goals of routine computing extend across four major application domains, including accessibility care, the promotion of healthy habits, adaptive and context-aware support, and large-scale population insights. Persistent challenges that limit the design of truly human-centered systems are identified, including the gap between low-level activity recognition and high-level intent, the tension between personalization and generalization, unresolved privacy concerns, and data-related limitations. By consolidating these findings, this paper provides a foundational framework for HCI researchers, outlining principles for designing ethical, adaptive, and human-centered routine-aware systems.

AIDec 24, 2024Code
AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

Hao Wen, Shizuo Tian, Borislav Pavlov et al.

Large language models (LLMs) have brought exciting new advances to mobile UI agents, a long-standing research field that aims to complete arbitrary natural language tasks through mobile UI interactions. However, existing UI agents usually demand powerful large language models that are difficult to be deployed locally on end-users' devices, raising huge concerns about user privacy and centralized serving cost. Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter. Unlike normal coding tasks that can be extensively pre-trained with public datasets, generating UI automation code is challenging due to the diversity, complexity, and variability of target apps. Therefore, we adopt a document-centered approach that automatically builds fine-grained API documentation for each app and generates diverse task samples based on this documentation. By guiding the agent with the synthetic documents and task samples, it learns to generate precise and efficient scripts to complete unseen tasks. Based on detailed comparisons with state-of-the-art mobile UI agents, our approach effectively improves the mobile task automation with significantly higher success rates and lower latency/token consumption. Code is open-sourced at https://github.com/MobileLLM/AutoDroid-V2.

91.0CVMay 17
EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Zeyu Wang, Chang Liu, Eduardus Tjitrahardja et al.

Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the ability of modern multimodal large language models to reason about users' internal states from egocentric observations. Experiments on our benchmark suggest that existing multimodal large language models struggle to effectively leverage multimodal signals to infer users' subjective internal states. The dataset and annotations will be made publicly available to advance research in egocentric vision and wearable AI assistants. Project page: https://ego-introspect.github.io/