ROMay 28Code
A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant ParadigmsYufei Jia, Zhanxiang Cao, Mingrui Yu et al.
Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU. We revisit this assumption. Our view is that, in simulation-dominated robot control, the essential question is not which processor runs physics, but whether simulation throughput, policy learning, and runtime synchronization form an efficient end-to-end loop. We present UniLab, a heterogeneous CPU-simulation / GPU-learning architecture that decouples CPU-parallel simulation from GPU policy updates through a unified runtime for data movement, buffering, and synchronization. UniLab is implemented as a complete and extensible training system using MuJoCoUni and MotrixSim CPU-batched physics backends, supporting PPO, SAC, FlashSAC, TD3, and APPO. On representative simulation-based robot control tasks, UniLab improves end-to-end training efficiency by 3--10$\times$ under the same hardware configuration, while reducing dependence on the NVIDIA CUDA-based software stack and supporting cross-platform execution on the Apple macOS platform and the AMD ROCm and Intel XPU accelerator backends. These results show that GPU simulation is an effective path to efficient training, but not a necessary one, broadening the practical system choices available for robot RL training. Project page: https://github.com/unilabsim/UniLab.
HCApr 30, 2023
Towards AI-Architecture Liberty: A Comprehensive Survey on Design and Generation of Virtual Architecture by Deep LearningAnqi Wang, Jiahua Dong, Lik-Hang Lee et al.
3D shape generation techniques leveraging deep learning have garnered significant interest from both the computer vision and architectural design communities, promising to enrich the content in the virtual environment. However, research on virtual architectural design remains limited, particularly regarding designer-AI collaboration and deep learning-assisted design. In our survey, we reviewed 149 related articles (81.2% of articles published between 2019 and 2023) covering architectural design, 3D shape techniques, and virtual environments. Through scrutinizing the literature, we first identify the principles of virtual architecture and illuminate its current production challenges, including datasets, multimodality, design intuition, and generative frameworks. We then introduce the latest approaches to designing and generating virtual buildings leveraging 3D shape generation and summarize four characteristics of various approaches to virtual architecture. Based on our analysis, we expound on four research agendas, including agency, communication, user consideration, and integrating tools. Additionally, we highlight four important enablers of ubiquitous interaction with immersive systems in deep learning-assisted architectural generation. Our work contributes to fostering understanding between designers and deep learning techniques, broadening access to designer-AI collaboration. We advocate for interdisciplinary efforts to address this timely research topic, facilitating content designing and generation in the virtual environment.
ROApr 28
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot LearningYufei Jia, Heng Zhang, Ziheng Zhang et al.
Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potential remains largely untapped for vision-informed tasks due to the prohibitive computational overhead of large-scale photorealistic rendering. Furthermore, the creation of simulation-ready 3D assets heavily relies on labor-intensive manual modeling, while the significant sim-to-real physical gap hinders the transfer of contact-rich manipulation policies. To address these bottlenecks, we propose GS-Playground, a multi-modal simulation framework designed to accelerate end-to-end perceptual learning. We develop a novel high-performance parallel physics engine, specifically designed to integrate with a batch 3D Gaussian Splatting (3DGS) rendering pipeline to ensure high-fidelity synchronization. Our system achieves a breakthrough throughput of 10^4 FPS at 640x480 resolution, significantly lowering the barrier for large-scale visual RL. Additionally, we introduce an automated Real2Sim workflow that reconstructs photorealistic, physically consistent, and memory-efficient environments, streamlining the generation of complex simulation-ready scenes. Extensive experiments on locomotion, navigation, and manipulation demonstrate that GS-Playground effectively bridges the perceptual and physical gaps across diverse embodied tasks. Project homepage: https://gsplayground.github.io.
HCApr 12
CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning TasksAnqi Wang, Dongyijie Pan, Xin Tong et al.
Although Large Language Models (LLMs) demonstrate proficiency in knowledge-intensive tasks, current interfaces frequently precipitate cognitive misalignment by failing to externalize users' underlying reasoning structures. Existing tools typically represent intent as "flat lists," thereby disregarding the causal dependencies and revisable assumptions inherent in human decision-making. We introduce CogInstrument, a system that represents user reasoning through cognitive motifs-compositional, revisable units comprising concepts linked by causal dependencies. CogInstrument extracts these motifs from natural language interactions and renders them as editable graphical structures to facilitate bidirectional alignment. This structural externalization enables both the user and the LLM to inspect, negotiate, and reconcile reasoning processes iteratively. A within-subjects study (N=12) demonstrates that CogInstrument explicitly surfaces implicit reasoning structures, facilitating more targeted revision and reusability over conventional LLM-based dialogue interfaces. By enabling users to verify the logical grounding of LLM outputs, CogInstrument significantly enhances user agency, trust, and structural control over the collaboration. This work formalizes cognitive motifs as a fundamental unit for human-LLM alignment, providing a novel framework for achieving structured, reasoning-based human-AI collaboration.
HCApr 12
NexusAI: Enabling Design Space Exploration of Ideas through Cognitive Abstraction and Functional DecompositionAnqi Wang, Bingqian Wang, Huiyang Chen et al.
Large Language Models (LLMs) offer vast potential for creative ideation; however, their standard interaction paradigm often produces unstructured textual outputs that lead users to prematurely converge on sub-optimal ideas-a phenomenon known as fixation. While recent creativity tools have begun to structure these outputs, they remain compositionally opaque: ideas are organized as monolithic units that cannot be decomposed, abstracted, or recombinable at a sub-idea level. To address this, we propose Cognitive Abstraction (CA), a computational pipeline that transforms raw LLM-generated inspiration into a navigable and transformable design space. We implement this pipeline in NexusAI, a prototype diagramming system that supports (I) decomposition of inspiration into typed functional fragments, (II) multi-level abstraction to externalize mental scaling, and (III) cross-dimensional recombination to spark novel design directions. A within-subject user study (N=14) demonstrates that NexusAI significantly improves design space exploration, reduces cognitive overhead, and facilitates perspective reframing compared to a baseline. Our work contributes: (1) a characterization of "compositional opacity" as a barrier in human-AI co-creation; (2) the CA pipeline for operationalizing creative cognitive primitives at scale; and (3) empirical evidence that structured, multi-level representations can effectively mitigate fixation and support divergent exploration.
HCMar 19
Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in MetaverseAnqi Wang, Lei Han, Jiahua Dong et al.
Digital platforms frequently reproduce heteronormative norms and structural biases, limiting inclusive communication between LGBTQ+ and cisgender individuals. The Metaverse, with its affordances for identity fluidity, presence, and community governance, offers a promising site for reimagining such interactions. To investigate this potential, we conducted participatory design workshops involving LGBTQ+ and cisgender participants, situating them in speculative Metaverse contexts to surface barriers and co-create alternative futures. The workshops followed a three-phase process-identifying challenges, speculative problem-solving, and visualizing futures-yielding socio-spatial-technical solutions across four layers: activity, interaction, scene, and space. These findings highlight the importance of spatial cues and power dynamics in shaping digital encounters. We contribute by (1) articulating challenges of cross-group communication in virtual environments, (2) proposing inclusive design opportunities for the Metaverse, and (3) advancing principles for addressing power geometry in digital space. This work demonstrates futuring as a critical strategy for designing equitable, transformative communication infrastructures.
CYMar 16
Hyper-learning and Unlearning: A Narrative Speculation on Urbanism in Media EcologiesAnqi Wang, Yue Hua, Xinyue Zhang et al.
Hyper-learning and Unlearning is a speculative animation that reflect how learning is reconfigured within digital media ecologies. Using architectural education as a microcosm, the work reframes the city as a hyper-learning apparatus where urban space, algorithmic systems, and platform infrastructures condition cognition and agency. By staging both hyper-learning and the unlearning induced by machine-supported cognition, the work critiques institutional gatekeeping while revealing how platforms reshape expertise, memory, and spatial experience. This project invites viewers to reconsider how urban space becomes pedagogical infrastructure in a posthumanism era.
CVOct 29, 2024
Discriminative Pedestrian Features and Gated Channel Attention for Clothes-Changing Person Re-IdentificationYongkang Ding, Rui Mao, Hanyue Zhu et al.
In public safety and social life, the task of Clothes-Changing Person Re-Identification (CC-ReID) has become increasingly significant. However, this task faces considerable challenges due to appearance changes caused by clothing alterations. Addressing this issue, this paper proposes an innovative method for disentangled feature extraction, effectively extracting discriminative features from pedestrian images that are invariant to clothing. This method leverages pedestrian parsing techniques to identify and retain features closely associated with individual identity while disregarding the variable nature of clothing attributes. Furthermore, this study introduces a gated channel attention mechanism, which, by adjusting the network's focus, aids the model in more effectively learning and emphasizing features critical for pedestrian identity recognition. Extensive experiments conducted on two standard CC-ReID datasets validate the effectiveness of the proposed approach, with performance surpassing current leading solutions. The Top-1 accuracy under clothing change scenarios on the PRCC and VC-Clothes datasets reached 64.8% and 83.7%, respectively.
HCFeb 15, 2024
Pinning "Reflection" on the Agenda: Investigating Reflection in Human-LLM Co-Creation for Creative CodingAnqi Wang, Zhizhuo Yin, Yulu Hu et al.
Large language models (LLMs) are increasingly integrated into creative coding, yet how users reflect, and how different co-creation conditions influence reflective behavior, remains underexplored. This study investigates situated, moment-to-moment reflection in creative coding under two prompting strategies: the entire task invocation (T1) and decomposed subtask invocation (T2), to examine their effects on reflective behavior. Our mixed-method results reveal three distinct reflection types and show that T2 encourages more frequent, strategic, and generative reflection, fostering diagnostic reasoning and goal redefinition. These findings offer insights into how LLM-based tools foster deeper creative engagement through structured, behaviorally grounded reflection support.
CVJan 26, 2022
Infrared and visible image fusion based on Multi-State Contextual Hidden Markov ModelXiaoqing Luo, Yuting Jiang, Anqi Wang et al.
The traditional two-state hidden Markov model divides the high frequency coefficients only into two states (large and small states). Such scheme is prone to produce an inaccurate statistical model for the high frequency subband and reduces the quality of fusion result. In this paper, a fine-grained multi-state contextual hidden Markov model (MCHMM) is proposed for infrared and visible image fusion in the non-subsampled Shearlet domain, which takes full consideration of the strong correlations and level of details of NSST coefficients. To this end, an accurate soft context variable is designed correspondingly from the perspective of context correlation. Then, the statistical features provided by MCHMM are utilized for the fusion of high frequency subbands. To ensure the visual quality, a fusion strategy based on the difference in regional energy is proposed as well for lowfrequency subbands. Experimental results demonstrate that the proposed method can achieve a superior performance compared with other fusion methods in both subjective and objective aspects.