Sijie Ji

h-index22

11papers

305citations

Novelty43%

AI Score48

Ranked #49,687 of 205,806 authors (top 24%)#9,800 in CL (top 30%)

11 Papers

79.8ITMay 1

Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey

Qianzhou Chen, Siqi Sun, Minrui Xu et al.

The rapid advancements in foundation models and sixth-generation (6G) wireless communication systems necessitate the development of efficient, scalable, and privacy-preserving machine learning approaches. For foundation models in 6G, split learning (SL) and aggregation learning (AL) have emerged as promising paradigms that address key challenges in distributed artificial intelligence (AI), such as communication efficiency, resource allocation, and data privacy. SL enables multiple entities to collaboratively train deep learning models by partitioning neural networks, while AL focuses on aggregating intermediate results or model updates from multiple participants, improving robustness, optimizing resource utilization, and mitigating data leakage risks. Specifically, SL is ideal for scenarios requiring strict data isolation (e.g., vertical collaborations), whereas AL suits homogeneous horizontal data settings; they can be combined to balance privacy and communication efficiency. This survey provides a comprehensive analysis of SL and AL in 6G communication systems, exploring their architectures, technical methodologies, and integration with AI-native 6G communication technologies. We examine different SL configurations, aggregation techniques, and their roles in optimizing distributed foundation models. Furthermore, we discuss their applications in emerging wireless networks, including semantic communication, reconfigurable intelligent surfaces (RIS), space-air-ground integrated networks (SAGINs), and quantum communication. By analyzing the impact of SL and AL, this survey provides insights into their role in shaping distributed AI-driven communication systems in the 6G era, focusing on efficiency, privacy preservation, and scalability.

CLSep 16, 2024

MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM

Sijie Ji, Xinzhe Zheng, Jiawei Sun et al.

Mental health disorders are among the most prevalent diseases worldwide, affecting nearly one in four people. Despite their widespread impact, the intervention rate remains below 25%, largely due to the significant cooperation required from patients for both diagnosis and intervention. The core issue behind this low treatment rate is stigma, which discourages over half of those affected from seeking help. This paper presents MindGuard, an accessible, stigma-free, and professional mobile mental healthcare system designed to provide mental health first aid. The heart of MindGuard is an innovative edge LLM, equipped with professional mental health knowledge, that seamlessly integrates objective mobile sensor data with subjective Ecological Momentary Assessment records to deliver personalized screening and intervention conversations. We conduct a broad evaluation of MindGuard using open datasets spanning four years and real-world deployment across various mobile devices involving 20 subjects for two weeks. Remarkably, MindGuard achieves results comparable to GPT-4 and outperforms its counterpart with more than 10 times the model size. We believe that MindGuard paves the way for mobile LLM applications, potentially revolutionizing mental healthcare practices by substituting self-reporting and intervention conversations with passive, integrated monitoring within daily life, thus ensuring accessible and stigma-free mental health support.

ITAug 24, 2022

Enhancing Deep Learning Performance of Massive MIMO CSI Feedback

Sijie Ji, Mo Li

CSI feedback is an important problem of Massive multiple-input multiple-output (MIMO) technology because the feedback overhead is proportional to the number of sub-channels and the number of antennas, both of which scale with the size of the Massive MIMO system. Deep learning-based CSI feedback methods have been widely adopted recently owing to their superior performance. Despite the success, current approaches have not fully exploited the relationship between the characteristics of CSI data and the deep learning framework. In this paper, we propose a jigsaw puzzles aided training strategy (JPTS) to enhance the deep learning-based Massive MIMO CSI feedback approaches by maximizing mutual information between the original CSI and the compressed CSI. We apply JPTS on top of existing state-of-the-art methods. Experimental results show that by adopting this training strategy, the accuracy can be boosted by 12.07% and 7.01% on average in indoor and outdoor environments, respectively. The proposed method is ready to adopt to existing deep learning frameworks of Massive MIMO CSI feedback. Codes of JPTS are available on GitHub for use.

ROApr 13, 2024Code

NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

Xinzhe Zheng, Sijie Ji, Yipeng Pan et al.

Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost inertial measurement units and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor have they maximized the potential of deep learning to achieve the desired accuracy. To address these limitations, we introduce NeurIT, which elevates tracking accuracy to a new level. NeurIT employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining both RNN and Transformer to learn representative features in both time and frequency domains. To fully utilize IMU information, we strategically employ body-frame differentiation of magnetometers, considerably reducing the tracking error. We implement NeurIT on a customized robotic platform and conduct evaluation in various indoor environments. Experimental results demonstrate that NeurIT achieves a mere 1-meter tracking error over a 300-meter distance. Notably, it significantly outperforms state-of-the-art baselines by 48.21% on unseen data. Moreover, NeurIT demonstrates robustness in large urban complexes and performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions while surpassing it in feature-sparse settings. We believe NeurIT takes an important step forward toward practical neural inertial tracking for ubiquitous and scalable tracking of robotic things. NeurIT is open-sourced here: https://github.com/aiot-lab/NeurIT.

84.9AIMay 13

How to Interpret Agent Behavior

Jie Gao, Kaiser Sun, Jen-tse Huang et al.

Autonomous agents such as Claude Code and Codex now operate for hours or even days. Understanding their runtime behavior has become critical for downstream tasks such as diagnosing inefficiencies, fixing bugs, and ensuring better oversight. A primary way to gain this understanding is analyzing the reasoning trajectories and execution traces these agents generate. Yet such data remains in unstructured natural-language form, making it difficult for humans to interpret at scale. We introduce ACT*ONOMY (a combination of Action and Taxonomy), a taxonomy for describing and analyzing agent behavior at runtime. ACT*ONOMY has two components: (1) the taxonomy itself, developed through Grounded Theory and structured as a three-level hierarchy of 10 actions, 46 subactions, and 120 leaf categories; and (2) an open repository that hosts the living taxonomy, provides an automated analysis pipeline that applies it to agent trajectories analysis, and defines an extension protocol for customization and growth. Our experiments show that ACTONOMY can compare behavioral profiles across agents and characterize a single agent's behavior across diverse trajectories, surfacing patterns indicative of failure modes. By providing a shared vocabulary, ACT*ONOMY helps researchers, agent designers, and end users interpret agent behavior more consistently, enabling better oversight and control.

89.6NEMay 11

Frequency Matching in Spiking Neural Networks for mmWave Sensing

Di Yu, Zhenyu Liao, Changze Lv et al.

Millimeter-wave (mmWave) sensing enables privacy-preserving, always-on edge perception, but its measurements are often sparse, temporally irregular, and corrupted by high-frequency noise. Existing mmWave pipelines predominantly rely on artificial neural networks (ANNs), which achieve robustness through extensive preprocessing or deep architectures, thereby limiting their efficiency on edge devices. In this work, we study spiking neural networks (SNNs) for mmWave sensing from a mechanism-data alignment perspective. By leveraging the low-pass filtering behavior of leaky integrate-and-fire (LIF) dynamics, we analyze how their implicit temporal filtering interacts with the frequency structure of mmWave signals. Our analysis shows that when discriminative information resides in low-to-mid frequencies, LIF dynamics can inherently suppress high-frequency noise, clarifying when and why SNNs outperform ANNs. Based on this insight, we derive a principled criterion for configuring the membrane decay factor by matching the effective bandwidth of LIF dynamics to the data's discriminative spectral content. Experimental results across four widely used mmWave datasets validate the proposed frequency-matching hypothesis, yielding an average test-accuracy improvement of 6.22% and a 3.64$\times$ reduction in theoretical energy consumption relative to ANN baselines, under a unified evaluation protocol.

CLMar 5, 2024

HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

Sijie Ji, Xinzhe Zheng, Chenshu Wu

There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstrating that LLMs can comprehend raw IMU data and perform HAR tasks in a zero-shot manner, with only appropriate prompts. HARGPT inputs raw IMU data into LLMs and utilizes the role-play and think step-by-step strategies for prompting. We benchmark HARGPT on GPT4 using two public datasets of different inter-class similarities and compare various baselines both based on traditional machine learning and state-of-the-art deep classification models. Remarkably, LLMs successfully recognize human activities from raw IMU data and consistently outperform all the baselines on both datasets. Our findings indicate that by effective prompting, LLMs can interpret raw IMU data based on their knowledge base, possessing a promising potential to analyze raw sensor data of the physical world effectively.

CLMar 10, 2024

Are You Being Tracked? Discover the Power of Zero-Shot Trajectory Tracing with LLMs!

Huanqi Yang, Sijie Ji, Rucheng Wu et al.

There is a burgeoning discussion around the capabilities of Large Language Models (LLMs) in acting as fundamental components that can be seamlessly incorporated into Artificial Intelligence of Things (AIoT) to interpret complex trajectories. This study introduces LLMTrack, a model that illustrates how LLMs can be leveraged for Zero-Shot Trajectory Recognition by employing a novel single-prompt technique that combines role-play and think step-by-step methodologies with unprocessed Inertial Measurement Unit (IMU) data. We evaluate the model using real-world datasets designed to challenge it with distinct trajectories characterized by indoor and outdoor scenarios. In both test scenarios, LLMTrack not only meets but exceeds the performance benchmarks set by traditional machine learning approaches and even contemporary state-of-the-art deep learning models, all without the requirement of training on specialized datasets. The results of our research suggest that, with strategically designed prompts, LLMs can tap into their extensive knowledge base and are well-equipped to analyze raw sensor data with remarkable effectiveness.

LGFeb 13, 2025

One-shot Federated Learning Methods: A Practical Guide

Xiang Liu, Zhenheng Tang, Xia Li et al.

One-shot Federated Learning (OFL) is a distributed machine learning paradigm that constrains client-server communication to a single round, addressing privacy and communication overhead issues associated with multiple rounds of data exchange in traditional Federated Learning (FL). OFL demonstrates the practical potential for integration with future approaches that require collaborative training models, such as large language models (LLMs). However, current OFL methods face two major challenges: data heterogeneity and model heterogeneity, which result in subpar performance compared to conventional FL methods. Worse still, despite numerous studies addressing these limitations, a comprehensive summary is still lacking. To address these gaps, this paper presents a systematic analysis of the challenges faced by OFL and thoroughly reviews the current methods. We also offer an innovative categorization method and analyze the trade-offs of various techniques. Additionally, we discuss the most promising future directions and the technologies that should be integrated into the OFL field. This work aims to provide guidance and insights for future research.

AIMay 20, 2025

ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data

Xinzhe Zheng, Sijie Ji, Jiawei Sun et al.

Mental health risk is a critical global public health challenge, necessitating innovative and reliable assessment methods. With the development of large language models (LLMs), they stand out to be a promising tool for explainable mental health care applications. Nevertheless, existing approaches predominantly rely on subjective textual mental records, which can be distorted by inherent mental uncertainties, leading to inconsistent and unreliable predictions. To address these limitations, this paper introduces ProMind-LLM. We investigate an innovative approach integrating objective behavior data as complementary information alongside subjective mental records for robust mental health risk assessment. Specifically, ProMind-LLM incorporates a comprehensive pipeline that includes domain-specific pretraining to tailor the LLM for mental health contexts, a self-refine mechanism to optimize the processing of numerical behavioral data, and causal chain-of-thought reasoning to enhance the reliability and interpretability of its predictions. Evaluations of two real-world datasets, PMData and Globem, demonstrate the effectiveness of our proposed methods, achieving substantial improvements over general LLMs. We anticipate that ProMind-LLM will pave the way for more dependable, interpretable, and scalable mental health case solutions.

ITFeb 15, 2021

CLNet: Complex Input Lightweight Neural Network designed for Massive MIMO CSI Feedback

Sijie Ji, Mo Li

Unleashing the full potential of massive MIMO in FDD mode by reducing the overhead of CSI feedback has recently garnered attention. Numerous deep learning for massive MIMO CSI feedback approaches have demonstrated their efficiency and potential. However, most existing methods improve accuracy at the cost of computational complexity and the accuracy decreases significantly as the CSI compression rate increases. This paper presents a novel neural network CLNet tailored for CSI feedback problem based on the intrinsic properties of CSI. CLNet proposes a forge complex-valued input layer to process signals and utilizes attention mechanism to enhance the performance of the network. The experiment result shows that CLNet outperforms the state-of-the-art method by average accuracy improvement of 5.41\% in both outdoor and indoor scenarios with average 24.1\% less computational overhead. Codes for deep learning-based CSI feedback CLNet are available at GitHub.