67.1AIMay 5
Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural TasksLilin Xu, Bufang Yang, Siyang Jiang et al.
Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-oriented procedural context from multi-scale temporal dynamics and task-specific expert knowledge. Based on both sensory input and procedural context, Pro$^2$Assist performs continuous reasoning to infer user needs and display timely assistance on AR glasses. We evaluate Pro$^2$Assist using a dataset curated from public sources and a real-world dataset collected on our testbed with AR glasses. Extensive evaluations show that Pro$^2$Assist outperforms the best-performing baselines by over 21% in procedural action understanding accuracy, and it achieves up to 2.29x the proactive timing accuracy of baselines. A user study with 20 participants further shows that 90% find Pro$^2$Assist useful, indicating its effectiveness for real-world procedural assistance.
SOC-PHAug 19, 2023
Finding emergence in data by maximizing effective informationMingzhe Yang, Zhipeng Wang, Kaiwei Liu et al.
Quantifying emergence and modeling emergent dynamics in a data-driven manner for complex dynamical systems is challenging due to the lack of direct observations at the micro-level. Thus, it's crucial to develop a framework to identify emergent phenomena and capture emergent dynamics at the macro-level using available data. Inspired by the theory of causal emergence (CE), this paper introduces a machine learning framework to learn macro-dynamics in an emergent latent space and quantify the degree of CE. The framework maximizes effective information, resulting in a macro-dynamics model with enhanced causal effects. Experimental results on simulated and real data demonstrate the effectiveness of the proposed framework. It quantifies degrees of CE effectively under various conditions and reveals distinct influences of different noise types. It can learn a one-dimensional coarse-grained macro-state from fMRI data, to represent complex neural activities during movie clip viewing. Furthermore, improved generalization to different test environments is observed across all simulation data.
AIMay 20, 2025Code
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory PerceptionsBufang Yang, Lilin Xu, Liekang Zeng et al.
Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functionality for proactive service. In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. ContextAgent first extracts multi-dimensional contexts from massive sensory perceptions on wearables (e.g., video and audio) to understand user intentions. ContextAgent then leverages the sensory contexts and personas from historical data to predict the necessity for proactive services. When proactive assistance is needed, ContextAgent further automatically calls the necessary tools to assist users unobtrusively. To evaluate this new task, we curate ContextAgentBench, the first benchmark for evaluating context-aware proactive LLM agents, covering 1,000 samples across nine daily scenarios and twenty tools. Experiments on ContextAgentBench show that ContextAgent outperforms baselines by achieving up to 8.5% and 6.0% higher accuracy in proactive predictions and tool calling, respectively. We hope our research can inspire the development of more advanced, human-centric, proactive AI assistants. The code and dataset are publicly available at https://github.com/openaiotlab/ContextAgent.
AIDec 7, 2025
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent SystemsBufang Yang, Lilin Xu, Liekang Zeng et al.
Large Language Model (LLM) agents are emerging to transform daily life. However, existing LLM agents primarily follow a reactive paradigm, relying on explicit user instructions to initiate services, which increases both physical and cognitive workload. In this paper, we propose ProAgent, the first end-to-end proactive agent system that harnesses massive sensory contexts and LLM reasoning to deliver proactive assistance. ProAgent first employs a proactive-oriented context extraction approach with on-demand tiered perception to continuously sense the environment and derive hierarchical contexts that incorporate both sensory and persona cues. ProAgent then adopts a context-aware proactive reasoner to map these contexts to user needs and tool calls, providing proactive assistance. We implement ProAgent on Augmented Reality (AR) glasses with an edge server and extensively evaluate it on a real-world testbed, a public dataset, and through a user study. Results show that ProAgent achieves up to 33.4% higher proactive prediction accuracy, 16.8% higher tool-calling F1 score, and notable improvements in user satisfaction over state-of-the-art baselines, marking a significant step toward proactive assistants. A video demonstration of ProAgent is available at https://youtu.be/pRXZuzvrcVs.
AIDec 9, 2025
DeepFeature: Iterative Context-aware Feature Generation for Wearable BiosignalsKaiwei Liu, Yuting He, Bufang Yang et al.
Biosignals collected from wearable devices are widely utilized in healthcare applications. Machine learning models used in these applications often rely on features extracted from biosignals due to their effectiveness, lower data dimensionality, and wide compatibility across various model architectures. However, existing feature extraction methods often lack task-specific contextual knowledge, struggle to identify optimal feature extraction settings in high-dimensional feature space, and are prone to code generation and automation errors. In this paper, we propose DeepFeature, the first LLM-empowered, context-aware feature generation framework for wearable biosignals. DeepFeature introduces a multi-source feature generation mechanism that integrates expert knowledge with task settings. It also employs an iterative feature refinement process that uses feature assessment-based feedback for feature re-selection. Additionally, DeepFeature utilizes a robust multi-layer filtering and verification approach for robust feature-to-code translation to ensure that the extraction functions run without crashing. Experimental evaluation results show that DeepFeature achieves an average AUROC improvement of 4.21-9.67% across eight diverse tasks compared to baseline methods. It outperforms state-of-the-art approaches on five tasks while maintaining comparable performance on the remaining tasks.
AIMay 21, 2024
DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert KnowledgeBufang Yang, Siyang Jiang, Lilin Xu et al.
Large language models (LLMs) have the potential to transform digital healthcare, as evidenced by recent advances in LLM-based virtual doctors. However, current approaches rely on patient's subjective descriptions of symptoms, causing increased misdiagnosis. Recognizing the value of daily data from smart devices, we introduce a novel LLM-based multi-turn consultation virtual doctor system, DrHouse, which incorporates three significant contributions: 1) It utilizes sensor data from smart devices in the diagnosis process, enhancing accuracy and reliability. 2) DrHouse leverages continuously updating medical databases such as Up-to-Date and PubMed to ensure our model remains at diagnostic standard's forefront. 3) DrHouse introduces a novel diagnostic algorithm that concurrently evaluates potential diseases and their likelihood, facilitating more nuanced and informed medical assessments. Through multi-turn interactions, DrHouse determines the next steps, such as accessing daily data from smart devices or requesting in-lab tests, and progressively refines its diagnoses. Evaluations on three public datasets and our self-collected datasets show that DrHouse can achieve up to an 18.8% increase in diagnosis accuracy over the state-of-the-art baselines. The results of a 32-participant user study show that 75% medical experts and 91.7% patients are willing to use DrHouse.
CVApr 3, 2024
VIAssist: Adapting Multi-modal Large Language Models for Users with Visual ImpairmentsBufang Yang, Lixing He, Kaiwei Liu et al.
Individuals with visual impairments, encompassing both partial and total difficulties in visual perception, are referred to as visually impaired (VI) people. An estimated 2.2 billion individuals worldwide are affected by visual impairments. Recent advancements in multi-modal large language models (MLLMs) have showcased their extraordinary capabilities across various domains. It is desirable to help VI individuals with MLLMs' great capabilities of visual understanding and reasoning. However, it is challenging for VI people to use MLLMs due to the difficulties in capturing the desirable images to fulfill their daily requests. For example, the target object is not fully or partially placed in the image. This paper explores how to leverage MLLMs for VI individuals to provide visual-question answers. VIAssist can identify undesired images and provide detailed actions. Finally, VIAssist can provide reliable answers to users' queries based on the images. Our results show that VIAssist provides +0.21 and +0.31 higher BERTScore and ROUGE scores than the baseline, respectively.
SOC-PHDec 28, 2023
Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative StudiesBing Yuan, Zhang Jiang, Aobo Lyu et al.
Emergence and causality are two fundamental concepts for understanding complex systems. They are interconnected. On one hand, emergence refers to the phenomenon where macroscopic properties cannot be solely attributed to the cause of individual properties. On the other hand, causality can exhibit emergence, meaning that new causal laws may arise as we increase the level of abstraction. Causal emergence theory aims to bridge these two concepts and even employs measures of causality to quantify emergence. This paper provides a comprehensive review of recent advancements in quantitative theories and applications of causal emergence. Two key problems are addressed: quantifying causal emergence and identifying it in data. Addressing the latter requires the use of machine learning techniques, thus establishing a connection between causal emergence and artificial intelligence. We highlighted that the architectures used for identifying causal emergence are shared by causal representation learning, causal model abstraction, and world model-based reinforcement learning. Consequently, progress in any of these areas can benefit the others. Potential applications and future perspectives are also discussed in the final section of the review.
CEOct 23, 2024
Predicting Company Growth by Econophysics informed Machine LearningRuyi Tao, Kaiwei Liu, Xu Jing et al.
Predicting company growth is crucial for strategic adjustment, operational decision-making, risk assessment, and loan eligibility reviews. Traditional models for company growth often focus too much on theory, overlooking practical forecasting, or they rely solely on time series forecasting techniques, ignoring interpretability and the inherent mechanisms of company growth. In this paper, we propose a machine learning-based prediction framework that incorporates an econophysics model for company growth. Our model captures both the intrinsic growth mechanisms of companies led by scaling laws and the fluctuations influenced by random factors and individual decisions, demonstrating superior predictive performance compared with methods that use time series techniques alone. Its advantages are more pronounced in long-range prediction tasks. By explicitly modeling the baseline growth and volatility components, our model is more interpretable.
62.0CLMar 15
SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor StreamsBufang Yang, Lilin Xu, Yixuan Li et al.
Personalization is essential for Large Language Model (LLM)-based agents to adapt to users' preferences and improve response quality and task performance. However, most existing approaches infer personas from chat histories, which capture only self-disclosed information rather than users' everyday behaviors in the physical world, limiting the ability to infer comprehensive user personas. In this work, we introduce SensorPersona, an LLM-empowered system that continuously infers stable user personas from multimodal longitudinal sensor streams unobtrusively collected from users' mobile devices. SensorPersona first performs person-oriented context encoding on continuous sensor streams to enrich the semantics of sensor contexts. It then employs hierarchical persona reasoning that integrates intra- and inter-episode reasoning to infer personas spanning physical patterns, psychosocial traits, and life experiences. Finally, it employs clustering-aware incremental verification and temporal evidence-aware updating to adapt to evolving personas. We evaluate SensorPersona on a self-collected dataset containing 1,580 hours of sensor data from 20 participants, collected over up to 3 months across 17 cities on 3 continents. Results show that SensorPersona achieves up to 31.4% higher recall in persona extraction, an 85.7% win rate in persona-aware agent responses, and notable improvements in user satisfaction compared to state-of-the-art baselines.
DCOct 17, 2025
Synera: Synergistic LLM Serving across Device and Cloud at ScaleGenglin Wang, Liekang Zeng, Bufang Yang et al.
Large Language Models (LLMs) are becoming key components in various mobile operating systems, driving smart applications like interactive chatbots and personal assistants. While bringing enhanced intelligence to mobile ends, their deployment suffers from a set of performance challenges, especially the generation quality degradation and prolonged latency. Prior works have mainly relied on solutions of cloud offloading or on-device Small Language Models (SLMs). However, the former is usually limited by the communication bottleneck, and the latter sacrifices generation quality due to resource constraints. To mitigate these limitations, this paper proposes Synera, a device-cloud synergistic LLM serving system that applies an efficient SLM-LLM synergistic mechanism. Through empirical studies on LLM's unique computing characteristics, Synera identifies a set of underexplored optimization opportunities in device-cloud synergistic LLM inference, including offloading decisions, pipeline stalls, and batching bottlenecks. To translate them into enhanced performance, Synera introduces tailored designs of communication-efficient selective offloading, stall-free parallel inference, and scalable cloud batching. Extensive evaluations with real-world testbeds show that Synera enables 1.20-5.47x better generation quality against competitive baselines with on-par latency performance. Compared with existing cloud serving, Synera achieves 8.2-16.5% lower cloud serving cost on various benchmarks.
CVMay 3, 2025
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior UnderstandingSiyang Jiang, Bufang Yang, Lilin Xu et al.
The rapid advancements in Large Vision Language Models (LVLMs) offer the potential to surpass conventional labeling by generating richer, more detailed descriptions of on-device human behavior understanding (HBU) in low-resolution vision systems, such as depth, thermal, and infrared. However, existing large vision language model (LVLM) approaches are unable to understand low-resolution data well as they are primarily designed for high-resolution data, such as RGB images. A quick fixing approach is to caption a large amount of low-resolution data, but it requires a significant amount of labor-intensive annotation efforts. In this paper, we propose a novel, labor-saving system, Llambda, designed to support low-resolution HBU. The core idea is to leverage limited labeled data and a large amount of unlabeled data to guide LLMs in generating informative captions, which can be combined with raw data to effectively fine-tune LVLM models for understanding low-resolution videos in HBU. First, we propose a Contrastive-Oriented Data Labeler, which can capture behavior-relevant information from long, low-resolution videos and generate high-quality pseudo labels for unlabeled data via contrastive learning. Second, we propose a Physical-Knowledge Guided Captioner, which utilizes spatial and temporal consistency checks to mitigate errors in pseudo labels. Therefore, it can improve LLMs' understanding of sequential data and then generate high-quality video captions. Finally, to ensure on-device deployability, we employ LoRA-based efficient fine-tuning to adapt LVLMs for low-resolution data. We evaluate Llambda using a region-scale real-world testbed and three distinct low-resolution datasets, and the experiments show that Llambda outperforms several state-of-the-art LVLM systems up to $40.03\%$ on average Bert-Score.
LGJan 25, 2022
Neural Information Squeezer for Causal EmergenceJiang Zhang, Kaiwei Liu
The classic studies of causal emergence have revealed that in some Markovian dynamical systems, far stronger causal connections can be found on the higher-level descriptions than the lower-level of the same systems if we coarse-grain the system states in an appropriate way. However, identifying this emergent causality from the data is still a hard problem that has not been solved because the correct coarse-graining strategy can not be found easily. This paper proposes a general machine learning framework called Neural Information Squeezer to automatically extract the effective coarse-graining strategy and the macro-state dynamics, as well as identify causal emergence directly from the time series data. By decomposing a coarse-graining operation into two processes: information conversion and information dropping out, we can not only exactly control the width of the information channel, but also can derive some important properties analytically including the exact expression of the effective information of a macro-dynamics. We also show how our framework can extract the dynamics on different levels and identify causal emergence from the data on several exampled systems.