LGSep 28, 2023Code
A Comprehensive View of Personalized Federated Learning on Heterogeneous Clinical DatasetsFatemeh Tavakoli, D. B. Emerson, Sana Ayromlou et al.
Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, we expand the FLamby benchmark (du Terrail et al., 2022a) to include a comprehensive evaluation of personalized FL methods and demonstrate substantive performance improvements over the original results. Next, we advocate for a comprehensive checkpointing and evaluation framework for FL to reflect practical settings and provide multiple comparison baselines. To this end, an open-source library aimed at making FL experimentation simpler and more reproducible is released. Finally, we propose an important ablation of PerFCL (Zhang et al., 2022). This ablation results in a natural extension of FENDA (Kim et al., 2016) to the FL setting. Experiments conducted on the FLamby benchmark and GEMINI datasets (Verma et al., 2017) show that the proposed approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques, including PerFCL.
HCMay 16
Gesture First, LLM-Assisted Voice Complement: Exploring Multimodal Robot 'Puppeteer' Teleoperation Via Virtual Counterpart in Augmented RealityYuchong Zhang, Bastian Orthmann, Shichen Ji et al.
Robot teleoperation via augmented reality (AR) offers a promising path toward more intuitive human-robot interaction (HRI). We present a head-mounted AR 'puppeteer' system in which users control a physical robot by interacting with its virtual counterpart robot using large language model (LLM)-assisted voice commands and hand-gesture interaction on the Meta Quest 3. In a within-subject user study with 42 participants performing an AR-based robotic pick-and-place pattern-matching task, we empirically compare two interaction conditions: gesture-only (GO) and combined voice+gesture (VG) on performance and user experience (UX). In VG, voice and gesture operate in a sequential role-allocated manner, with voice handling high-level navigation and gesture handling fine manipulation. Our results show that GO currently provides more reliable and efficient control for this time-critical task, while VG introduces additional flexibility but also latency and recognition issues that can increase workload. We additionally analyze how prior robotics expertise differentiates performance and UX across conditions. Based on these findings, we distill a set of design guidelines for AR 'puppeteer' metaphoric robot teleoperation, framing multimodality as an adaptive strategy that must balance efficiency, robustness, and user expertise rather than assuming that additional modalities are universally beneficial.
ROMay 23
Elevator-LIO: Robust LiDAR-Inertial Odometry for Multi-Floor Navigation under Elevator-Induced Non-Inertial MotionYifan Zhang, Yudong Huang, Yuchong Zhang et al.
This paper presents Elevator-LIO, a LiDAR-inertial odometry framework designed to achieve continuous robot localization during elevator travel, thereby supporting cross-floor robotic tasks. To address the state-estimation problem in non-inertial frames, Elevator-LIO establishes a decoupled state-estimation model that separately models the robot motion relative to the elevator and the elevator motion itself, and embeds it into a mode-dependent iterated error-state Kalman filter framework. This framework degenerates to conventional LIO estimation in ordinary indoor environments, while enabling the propagation and constrained update of elevator-related states in elevator non-inertial environments, thereby achieving continuous and stable localization. An elevator mode manager detects elevator entry and exit events using LiDAR ranging statistics and estimated states, and introduces event-triggered zero-velocity and zero-acceleration updates when the elevator stops to suppress accumulated vertical drift. In addition, this paper adopts an adaptive voxel downsampling strategy to maintain a stable number of effective points under significant environmental scale changes. We conduct extensive experiments on 20 real-world sequences containing 79 elevator rides, including practical challenges such as large-scale spaces, long vertical travel, dynamic pedestrian interference, and mirror reflections. The results show that Elevator-LIO maintains continuous localization accuracy in all sequences, with terminal height error below 1 cm in 17 sequences. In contrast, existing representative localization systems perform poorly on these elevator sequences. Tests on the Hilti 2022/2023 datasets further show that the proposed method remains competitive in standard indoor scenarios. The project page is available at https://xiaofan4122.github.io/Elevator_LIO_Page/.
LGMay 5, 2022
GANs as Gradient Flows that ConvergeYu-Jui Huang, Yuchong Zhang
This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. A main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data distribution emerges as the long-time limit. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, the simulation of the ODE is shown equivalent to the training of generative adversarial networks (GANs). This equivalence provides a new "cooperative" view of GANs and, more importantly, sheds new light on the divergence of GANs. In particular, it reveals that the GAN algorithm implicitly minimizes the mean squared error (MSE) between two sets of samples, and this MSE fitting alone can cause GANs to diverge. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, by the Crandall-Liggett theorem for differential equations in Banach spaces. Based on this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, using Trevisan's superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.
LGFeb 10
The Laplacian Mechanism Improves Transformers by Reshaping Token GeometryYuchong Zhang, Vardan Papyan
Transformers leverage attention, the residual connection, and layer normalization to control the variance of token representations. We propose to modify attention into a Laplacian mechanism that gives the model more direct control over token variance. We conjecture that this helps transformers achieve the ideal token geometry. To investigate our conjecture, we first show that incorporating the Laplacian mechanism into transformers induces consistent improvements across benchmarks in computer vision and language. Next, we study how the Laplacian mechanism impacts the geometry of token representations using various tools: 1) principal component analysis, 2) cosine similarity metric, 3) analysis of variance, and 4) Neural Collapse metrics. Our investigation shows that the Laplacian mechanism reshapes token embeddings toward a geometry of maximal separability: tokens collapse according to their classes, and the class means exhibit Neural Collapse.
HCDec 11, 2025
Reframing Human-Robot Interaction Through Extended Reality: Unlocking Safer, Smarter, and More Empathic Interactions with Virtual Robots and Foundation ModelsYuchong Zhang, Yong Ma, Danica Kragic
This perspective reframes human-robot interaction (HRI) through extended reality (XR), arguing that virtual robots powered by large foundation models (FMs) can serve as cognitively grounded, empathic agents. Unlike physical robots, XR-native agents are unbound by hardware constraints and can be instantiated, adapted, and scaled on demand, while still affording embodiment and co-presence. We synthesize work across XR, HRI, and cognitive AI to show how such agents can support safety-critical scenarios, socially and cognitively empathic interaction across domains, and outreaching physical capabilities with XR and AI integration. We then discuss how multimodal large FMs (e.g., large language model, large vision model, and vision-language model) enable context-aware reasoning, affect-sensitive situations, and long-term adaptation, positioning virtual robots as cognitive and empathic mediators rather than mere simulation assets. At the same time, we highlight challenges and potential risks, including overtrust, cultural and representational bias, privacy concerns around biometric sensing, and data governance and transparency. The paper concludes by outlining a research agenda for human-centered, ethically grounded XR agents - emphasizing multi-layered evaluation frameworks, multi-user ecosystems, mixed virtual-physical embodiment, and societal and ethical design practices to envision XR-based virtual agents powered by FMs as reshaping future HRI into a more efficient and adaptive paradigm.
NIMar 6
A Dual-AoI-based Approach for Optimal Transmission Scheduling in Wireless Monitoring Systems with Random Data ArrivalsYuchong Zhang, Yi Cao, Xianghui Cao
In Internet of Things (IoTs), the freshness of system status information is crucial for real-time monitoring and decision-making. This paper studies the transmission scheduling problem in wireless monitoring systems, where information freshness -- typically quantified by the Age of Information (AoI) -- is heavily constrained by limited channel resources and influenced by factors such as the randomness of data arrivals and unreliable wireless channel. Such randomness leads to asynchronous AoI evolution at local sensors and the monitoring center, rendering conventional scheduling policies that rely solely on the monitoring center's AoI inefficient. To this end, we propose a dual-AoI model that captures asynchronous AoI dynamics and formulate the problem as minimizing a long-term time-average AoI function. We develop a scheduling policy based on Markov decision process (MDP) to solve the problem, and analyze the existence and monotonicity of a deterministic stationary optimal policy. Moreover, we derive a low-complexity scheduling policy which exhibits a channel-state-dependent threshold structure. In addition, we establish a necessary and sufficient condition for the stability of the AoI objective. Simulation results demonstrate that the proposed policy outperforms existing approaches.
AIOct 10, 2025
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMsXi Fang, Weijie Xu, Yuchong Zhang et al. · amazon-science
When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion understanding and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models emotional reasoning. These results highlight a key challenge for memory enhanced AI: systems designed for personalization may inadvertently reinforce social inequalities.