LGJul 4, 2022Code
Federated Split GANsPranvera Kortoçi, Yilei Liang, Pengyuan Zhou et al.
Mobile devices and the immense amount and variety of data they generate are key enablers of machine learning (ML)-based applications. Traditional ML techniques have shifted toward new paradigms such as federated (FL) and split learning (SL) to improve the protection of user's data privacy. However, these paradigms often rely on server(s) located in the edge or cloud to train computationally-heavy parts of a ML model to avoid draining the limited resource on client devices, resulting in exposing device data to such third parties. This work proposes an alternative approach to train computationally-heavy ML models in user's devices themselves, where corresponding device data resides. Specifically, we focus on GANs (generative adversarial networks) and leverage their inherent privacy-preserving attribute. We train the discriminative part of a GAN with raw data on user's devices, whereas the generative model is trained remotely (e.g., server) for which there is no need to access sensor true data. Moreover, our approach ensures that the computational load of training the discriminative model is shared among user's devices-proportional to their computation capabilities-by means of SL. We implement our proposed collaborative training scheme of a computationally-heavy GAN model in real resource-constrained devices. The results show that our system preserves data privacy, keeps a short training time, and yields same accuracy of model training in unconstrained devices (e.g., cloud). Our code can be found on https://github.com/YukariSonz/FSL-GAN
AIMar 21, 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?Chaoning Zhang, Chenshuang Zhang, Sheng Zheng et al.
As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. Impressed by the capability of the ChatGPT, many people are wondering about its limits: can GPT-5 (or other future GPT variants) help ChatGPT unify all AIGC tasks for diversified content creation? Toward answering this question, a comprehensive review of existing AIGC tasks is needed. As such, our work comes to fill this gap promptly by offering a first look at AIGC, ranging from its techniques to applications. Modern generative AI relies on various technical foundations, ranging from model architecture and self-supervised pretraining to generative modeling methods (like GAN and diffusion models). After introducing the fundamental techniques, this work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT's future. Moreover, we summarize their significant applications in some mainstream industries, such as education and creativity content. Finally, we discuss the challenges currently faced and present an outlook on how generative AI might evolve in the near future.
CYApr 4, 2023
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC EraChaoning Zhang, Chenshuang Zhang, Chenghao Li et al.
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.
SIMar 6, 2022
Twitter Dataset for 2022 Russo-Ukrainian CrisisEhsan-Ul Haq, Gareth Tyson, Lik-Hang Lee et al.
Online Social Networks (OSNs) play a significant role in information sharing during a crisis. The data collected during such a crisis can reflect the large scale public opinions and sentiment. In addition, OSN data can also be used to study different campaigns that are employed by various entities to engineer public opinions. Such information sharing campaigns can range from spreading factual information to propaganda and misinformation. We provide a Twitter dataset of the 2022 Russo-Ukrainian conflict. In the first release, we share over 1.6 million tweets shared during the 1st week of the crisis.
38.9MMMar 16
Multimodal Cyber-physical Interaction in XR: Hybrid Doctoral Thesis DefenseAhmad Alhilal, Kit Yung Lam, Lik-Hang Lee et al.
Academic events, such as a doctoral thesis defense, are typically limited to either physical co-location or flat video conferencing, resulting in rigid participation formats and fragmented presence. We present a multimodal framework that breaks this binary by supporting a spectrum of participation - from in-person attendance to immersive virtual reality (VR) or browser access - and report our findings from using it to organize the first ever hybrid doctoral thesis defense using extended reality (XR). The framework integrates full-body motion tracking to synchronize the user's avatar motions and gestures, enabling natural interaction with onsite participants as well as body language and gestures with remote attendees in the virtual world. It leverages WebXR to provide cross-platform and instant accessibility with easy setup. User feedback analysis reveals positive VR experiences and demonstrates the framework's effectiveness in supporting various hybrid event activities.
HCApr 30, 2023
Towards AI-Architecture Liberty: A Comprehensive Survey on Design and Generation of Virtual Architecture by Deep LearningAnqi Wang, Jiahua Dong, Lik-Hang Lee et al.
3D shape generation techniques leveraging deep learning have garnered significant interest from both the computer vision and architectural design communities, promising to enrich the content in the virtual environment. However, research on virtual architectural design remains limited, particularly regarding designer-AI collaboration and deep learning-assisted design. In our survey, we reviewed 149 related articles (81.2% of articles published between 2019 and 2023) covering architectural design, 3D shape techniques, and virtual environments. Through scrutinizing the literature, we first identify the principles of virtual architecture and illuminate its current production challenges, including datasets, multimodality, design intuition, and generative frameworks. We then introduce the latest approaches to designing and generating virtual buildings leveraging 3D shape generation and summarize four characteristics of various approaches to virtual architecture. Based on our analysis, we expound on four research agendas, including agency, communication, user consideration, and integrating tools. Additionally, we highlight four important enablers of ubiquitous interaction with immersive systems in deep learning-assisted architectural generation. Our work contributes to fostering understanding between designers and deep learning techniques, broadening access to designer-AI collaboration. We advocate for interdisciplinary efforts to address this timely research topic, facilitating content designing and generation in the virtual environment.
48.9HCMar 29
Conflict Resolution Strategies for Co-manipulation of Virtual Objects Under Non-disjoint ConditionsXian Wang, Xuanru Cheng, Rongkai Shi et al.
Virtual Reality (VR) co-manipulation enables multiple users to collaboratively interact with shared virtual objects. However, existing research treats objects as monolithic entities, overlooking scenarios where users need to manipulate different sub-components simultaneously. This work addresses conflict resolution when users select overlapping vertices (non-disjoint sets) during co-manipulation. We present a comprehensive framework comprising preventive strategies (Object-level and Action-level Restrictions) and reactive strategies (computational conflict resolution). Through two user studies with 76 participants (38 pairs), we evaluated these approaches in collaborative wireframe editing tasks. Study 1 identified Averaging as the optimal computational method, balancing task efficiency with user experience. Study 2 highlighted that Action-level Restriction, which permits overlapping selections but restricts concurrent identical operations, achieved better performance compared to exclusive object locking. Reactive strategies using averaging provided smooth collaboration for experienced users, while second-user priority enabled quick corrections. Our findings indicate that optimal strategy selection depends on task requirements, user expertise, and collaboration patterns. Based on the findings, we provide design implications for developing VR collaboration systems that support flexible sub-components manipulation while maintaining collaborative awareness and minimizing conflicts.
HCDec 19, 2025
Perceptions of AI-CBT: Trust and Barriers in Chinese PostgradsChan-in Sio, Alex Mann, Lingxi Fan et al.
The mental well-being of graduate students is an increasing concern, yet the adoption of scalable support remains uneven. Artificial intelligence-powered cognitive behavioral therapy chatbots (AI-CBT) offer low barrier help, but little is known about how Chinese postgraduates perceive and use them. This qualitative study explored perceptions and experiences of AI-CBT chatbots among ten Chinese graduate students recruited through social media. Semi-structured Zoom interviews were conducted and analyzed using reflexive thematic analysis, with the Health Belief Model (HBM) and the Theory of Planned Behavior (TPB) as sensitizing frameworks. The findings indicate a cautious openness to AI-CBT chatbots: perceived usefulness and 24/7 access supported favorable attitudes, while data privacy, emotional safety, and uncertainty about `fit' for complex problems restricted the intention to use. Social norms (e.g., stigma and peer views) and perceived control (digital literacy, language quality) further shaped adoption. The study offers context-specific information to guide the culturally sensitive design, communication, and deployment of AI mental well-being tools for student populations in China and outlines the design implications around transparency, safeguards, and graduated care pathways.
91.5CLApr 21Code
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache HashingJinyu Guo, Zhihan Zhang, Yutong Li et al.
The quadratic computational complexity of the standard attention mechanism constitutes a fundamental bottleneck for large language models in long-context inference. While existing KV cache compression methods alleviate memory pressure, they often sacrifice generation quality and fail to address the high overhead of floating-point arithmetic. This paper introduces DASH-KV, an innovative acceleration framework that reformulates attention as approximate nearest-neighbor search via asymmetric deep hashing. Under this paradigm, we design an asymmetric encoding architecture that differentially maps queries and keys to account for their distinctions in precision and reuse characteristics. To balance efficiency and accuracy, we further introduce a dynamic mixed-precision mechanism that adaptively retains full-precision computation for critical tokens. Extensive experiments on LongBench demonstrate that DASH-KV significantly outperforms state-of-the-art baseline methods while matching the performance of full attention, all while reducing inference complexity from O(N^2) to linear O(N). The code is available at https://github.com/Zhihan-Zh/DASH-KV
77.5AIApr 7
Experience Transfer for Multimodal LLM Agents in Minecraft GameChenghao Li, Jun Liu, Songbo Zhang et al.
Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repository of static records. To make transfer explicit, Echo decomposes reusable knowledge into five dimensions: structure, attribute, process, function, and interaction. This formulation allows the agent to identify recurring patterns shared across different tasks and infer what prior experience remains applicable in new situations. Building on this formulation, Echo leverages In-Context Analogy Learning (ICAL) to retrieve relevant experiences and adapt them to unseen tasks through contextual examples. Experiments in Minecraft show that, under a from-scratch learning setting, Echo achieves a 1.3x to 1.7x speed-up on object-unlocking tasks. Moreover, Echo exhibits a burst-like chain-unlocking phenomenon, rapidly unlocking multiple similar items within a short time interval after acquiring transferable experience. These results suggest that experience transfer is a promising direction for improving the efficiency and adaptability of multimodal LLM agents in complex interactive environments.
HCMar 6
Non-urgent Messages Do Not Jump into My Headset Suddenly! Adaptive Notification Design in Mixed RealityJingyao Zheng, Xian Wang, Sven Mayer et al.
Mixed reality (MR) notification systems currently display all messages in fixed central locations regardless of urgency, leading to unnecessary interruptions and cognitive overload. Drawing from previous MR/Virtual Reality (VR) notification design work and calm technology principles, we developed an adaptive notification system that adjusts spatial placement based on urgency levels: non-urgent notifications appear as peripheral icons accessible via head movement, moderately urgent messages anchor to the user's hand, and very urgent notifications transition progressively from peripheral to central view. Through a within-subjects study (N=18), we evaluated our adaptive system against the default centralised approach. Results demonstrate that the adaptive system significantly reduces mental workload (p=0.041), temporal workload (p=0.008), and frustration (p=0.004) while maintaining comparable notification awareness. Logistic regression analysis reveals that users prefer the adaptive system even with classification errors, provided the combined misclassification rate (disruptiveness + omission errors) remains below a determinable threshold. Our findings establish the first empirical evidence that urgency-based spatial notification distribution effectively addresses core MR usability challenges, offering practical design guidelines for immersive notification systems that balance user attention management with information accessibility.
HCNov 5, 2025
When Generative Artificial Intelligence meets Extended Reality: A Systematic ReviewXinyu Ning, Yan Zhuo, Xian Wang et al.
With the continuous advancement of technology, the application of generative artificial intelligence (AI) in various fields is gradually demonstrating great potential, particularly when combined with Extended Reality (XR), creating unprecedented possibilities. This survey article systematically reviews the applications of generative AI in XR, covering as much relevant literature as possible from 2023 to 2025. The application areas of generative AI in XR and its key technology implementations are summarised through PRISMA screening and analysis of the final 26 articles. The survey highlights existing articles from the last three years related to how XR utilises generative AI, providing insights into current trends and research gaps. We also explore potential opportunities for future research to further empower XR through generative AI, providing guidance and information for future generative XR research.
AIMar 8, 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video GenerationJoseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng et al.
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).
CVApr 4, 2024
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern SamplingHaoran Li, Haolin Shi, Wenli Zhang et al.
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .
AIFeb 10
GHS-TDA: A Synergistic Reasoning Framework Integrating Global Hypothesis Space with Topological Data AnalysisJiaquan Zhang, Chaoning Zhang, Shuxu Chen et al.
Chain-of-Thought (CoT) has been shown to significantly improve the reasoning accuracy of large language models (LLMs) on complex tasks. However, due to the autoregressive, step-by-step generation paradigm, existing CoT methods suffer from two fundamental limitations. First, the reasoning process is highly sensitive to early decisions: once an initial error is introduced, it tends to propagate and amplify through subsequent steps, while the lack of a global coordination and revision mechanism makes such errors difficult to correct, ultimately leading to distorted reasoning chains. Second, current CoT approaches lack structured analysis techniques for filtering redundant reasoning and extracting key reasoning features, resulting in unstable reasoning processes and limited interpretability. To address these issues, we propose GHS-TDA. GHS-TDA first constructs a semantically enriched global hypothesis graph to aggregate, align, and coordinate multiple candidate reasoning paths, thereby providing alternative global correction routes when local reasoning fails. It then applies topological data analysis based on persistent homology to capture stable multi-scale structures, remove redundancy and inconsistencies, and extract a more reliable reasoning skeleton. By jointly leveraging reasoning diversity and topological stability, GHS-TDA achieves self-adaptive convergence, produces high-confidence and interpretable reasoning paths, and consistently outperforms strong baselines in terms of both accuracy and robustness across multiple reasoning benchmarks.
53.1CLApr 25
From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph PriorsYitian Zhou, Chaoning Zhang, Jiaquan Zhang et al.
Long-context large language models remain computationally expensive to run and often fail to reliably process very long inputs, which makes context compression an important component of many systems. Existing compression approaches typically rely on trained compressors, dense retrieval-style selection, or heuristic trimming, and they often struggle to jointly preserve task relevance, topic coverage, and cross-sentence coherence under a strict token budget. To address this, we propose a training-free and model-agnostic compression framework that selects a compact set of sentences guided by structural graph priors. Our method constructs a sparse hybrid sentence graph that combines mutual k-NN semantic edges with short-range sequential edges, extracts a topic skeleton via clustering, and ranks sentences using an interpretable score that integrates task relevance, cluster representativeness, bridge centrality, and a cycle coverage cue. A budgeted greedy selection with redundancy suppression then produces a readable compressed context in original order. Experimental results on four datasets show that our approach is competitive with strong extractive and abstractive baselines, demonstrating larger gains on long-document benchmarks.
CLOct 23, 2024
LMLPA: Language Model Linguistic Personality AssessmentJingyao Zheng, Xian Wang, Simo Hosio et al.
Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.
LGMar 31, 2025
MetaCLBench: Meta Continual Learning Benchmark on Resource-Constrained Edge DevicesSijia Li, Young D. Kwon, Lik-Hang Lee et al.
Meta-Continual Learning (Meta-CL) has emerged as a promising approach to minimize manual labeling efforts and system resource requirements by enabling Continual Learning (CL) with limited labeled samples. However, while existing methods have shown success in image-based tasks, their effectiveness remains unexplored for sequential time-series data from sensor systems, particularly audio inputs. To address this gap, we conduct a comprehensive benchmark study evaluating six representative Meta-CL approaches using three network architectures on five datasets from both image and audio modalities. We develop MetaCLBench, an end-to-end Meta-CL benchmark framework for edge devices to evaluate system overheads and investigate trade-offs among performance, computational costs, and memory requirements across various Meta-CL methods. Our results reveal that while many Meta-CL methods enable to learn new classes for both image and audio modalities, they impose significant computational and memory costs on edge devices. Also, we find that pre-training and meta-training procedures based on source data before deployment improve Meta-CL performance. Finally, to facilitate further research, we provide practical guidelines for researchers and machine learning practitioners implementing Meta-CL on resource-constrained environments and make our benchmark framework and tools publicly available, enabling fair evaluation across both accuracy and system-level metrics.
CLJan 24, 2024
APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPTYiming Zhu, Zhizhuo Yin, Gareth Tyson et al.
Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
CVMay 12, 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt EngineeringChaoning Zhang, Joseph Cho, Fachrina Dewi Puspitasari et al.
The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.
CVMay 10, 2023
Generative AI meets 3D: A Survey on Text-to-3D in AIGC EraChenghao Li, Chaoning Zhang, Joseph Cho et al.
Generative AI has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AI-generated content (AIGC). Thanks to advancements in text-to-image and 3D modeling technologies, like neural radiance field (NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts a comprehensive survey on this topic and follows up on subsequent research progress in the overall field, aiming to help readers interested in this direction quickly catch up with its rapid development. First, we introduce 3D data representations, including both Structured and non-Structured data. Building on this pre-requisite, we introduce various core technologies to achieve satisfactory text-to-3D results. Additionally, we present mainstream baselines and research directions in recent text-to-3D technology, including fidelity, efficiency, consistency, controllability, diversity, and applicability. Furthermore, we summarize the usage of text-to-3D technology in various applications, including avatar generation, texture generation, scene generation and 3D editing. Finally, we discuss the agenda for the future development of text-to-3D.
HCFeb 23, 2022
From Digital Media to Empathic Reality: A Systematic Review of Empathy Research in Extended Reality EnvironmentsVille Paananen, Mohammad Sina Kiarostami, Lik-Hang Lee et al.
Recent advances in extended reality (XR) technologies have enabled new and increasingly realistic empathy tools and experiences. In XR, all interactions take place in different spatial contexts, all with different features, affordances, and constraints. We present a systematic literature survey of recent work on empathy in XR. As a result, we contribute a research roadmap with three future opportunities in XR-enabled empathy research across both physical and virtual spaces.
CYFeb 22, 2022
Towards User-Centered Metrics for Trustworthy AI in Immersive CyberspacePengyuan Zhou, Benjamin Finley, Lik-Hang Lee et al.
AI plays a key role in current cyberspace and future immersive ecosystems that pinpoint user experiences. Thus, the trustworthiness of such AI systems is vital as failures in these systems can cause serious user harm. Although there are related works on exploring trustworthy AI (TAI) metrics in the current cyberspace, ecosystems towards user-centered services, such as the metaverse, are much more complicated in terms of system performance and user experience assessment, thus posing challenges for the applicability of existing approaches. Thus, we give an overlook on fairness, privacy and robustness, across the historical path from existing approaches. Eventually, we propose a research agenda towards systematic yet user-centered TAI in immersive ecosystems.
HCJan 18, 2022
VibroWeight: Simulating Weight and Center of Gravity Changes of Objects in Virtual Reality for Enhanced RealismXian Wang, Diego Monteiro, Lik-Hang Lee et al.
Haptic feedback in virtual reality (VR) allows users to perceive the physical properties of virtual objects (e.g., their weight and motion patterns). However, the lack of haptic sensations deteriorates users' immersion and overall experience. In this work, we designed and implemented a low-cost hardware prototype with liquid metal, VibroWeight, which can work in complementarity with commercial VR handheld controllers. VibroWeight is characterized by bimodal feedback cues in VR, driven by adaptive absolute mass (weights) and gravity shift. To our knowledge, liquid metal is used in a VR haptic device for the first time. Our 29 participants show that VibroWeight delivers significantly better VR experiences in realism and comfort.
HCJan 10, 2022
DiOS -- An Extended Reality Operating System for the MetaverseTristan Braud, Lik-Hang Lee, Ahmad Alhilal et al.
Driven by the recent improvements in device and networks capabilities, Extended Reality (XR) is becoming more pervasive; industry and academia alike envision ambitious projects such as the metaverse. However, XR is still limited by the current architecture of mobile systems. This paper makes the case for an XR-specific operating system (XROS). Such an XROS integrates hardware-support, computer vision algorithms, and XR-specific networking as the primitives supporting XR technology. These primitives represent the physical-digital world as a single shared resource among applications. Such an XROS allows for the development of coherent and system-wide interaction and display methods, systematic privacy preservation on sensor data, and performance improvement while simplifying application development.
CYNov 26, 2021
When Creators Meet the Metaverse: A Survey on Computational ArtsLik-Hang Lee, Zijun Lin, Rui Hu et al.
The metaverse, enormous virtual-physical cyberspace, has brought unprecedented opportunities for artists to blend every corner of our physical surroundings with digital creativity. This article conducts a comprehensive survey on computational arts, in which seven critical topics are relevant to the metaverse, describing novel artworks in blended virtual-physical realities. The topics first cover the building elements for the metaverse, e.g., virtual scenes and characters, auditory, textual elements. Next, several remarkable types of novel creations in the expanded horizons of metaverse cyberspace have been reflected, such as immersive arts, robotic arts, and other user-centric approaches fuelling contemporary creative outputs. Finally, we propose several research agendas: democratising computational arts, digital privacy, and safety for metaverse artists, ownership recognition for digital artworks, technological challenges, and so on. The survey also serves as introductory material for artists and metaverse technologists to begin creations in the realm of surrealistic cyberspace.
HCNov 9, 2021
EdgeXAR: A 6-DoF Camera Multi-target Interaction Framework for MAR with User-friendly Latency CompensationWenxiao Zhang, Sikun Lin, Farshid Hassani Bijarbooneh et al.
The computational capabilities of recent mobile devices enable the processing of natural features for Augmented Reality (AR), but the scalability is still limited by the devices' computation power and available resources. In this paper, we propose EdgeXAR, a mobile AR framework that utilizes the advantages of edge computing through task offloading to support flexible camera-based AR interaction. We propose a hybrid tracking system for mobile devices that provides lightweight tracking with 6 Degrees of Freedom and hides the offloading latency from users' perception. A practical, reliable and unreliable communication mechanism is used to achieve fast response and consistency of crucial information. We also propose a multi-object image retrieval pipeline that executes fast and accurate image recognition tasks on the cloud and edge servers. Extensive experiments are carried out to evaluate the performance of EdgeXAR by building mobile AR Apps upon it. Regarding the Quality of Experience (QoE), the mobile AR Apps powered by EdgeXAR framework run on average at the speed of 30 frames per second with precise tracking of only 1~2 pixel errors and accurate image recognition of at least 97% accuracy. As compared to Vuforia, one of the leading commercial AR frameworks, EdgeXAR transmits 87% less data while providing a stable 30 FPS performance and reducing the offloading latency by 50 to 70% depending on the transmission medium. Our work facilitates the large-scale deployment of AR as the next generation of ubiquitous interfaces.
HCJun 16, 2021
Mobile Augmented Reality: User Interfaces, Frameworks, and IntelligenceJacky Cao, Kit-Yung Lam, Lik-Hang Lee et al.
Mobile Augmented Reality (MAR) integrates computer-generated virtual objects with physical environments for mobile devices. MAR systems enable users to interact with MAR devices, such as smartphones and head-worn wearables, and performs seamless transitions from the physical world to a mixed world with digital entities. These MAR systems support user experiences by using MAR devices to provide universal accessibility to digital contents. Over the past 20 years, a number of MAR systems have been developed, however, the studies and design of MAR frameworks have not yet been systematically reviewed from the perspective of user-centric design. This article presents the first effort of surveying existing MAR frameworks (count: 37) and further discusses the latest studies on MAR through a top-down approach: 1) MAR applications; 2) MAR visualisation techniques adaptive to user mobility and contexts; 3) systematic evaluation of MAR frameworks including supported platforms and corresponding features such as tracking, feature extraction plus sensing capabilities; and 4) underlying machine learning approaches supporting intelligent operations within MAR systems. Finally, we summarise the development of emerging research fields, current state-of-the-art, and discuss the important open challenges and possible theoretical and technical directions. This survey aims to benefit both researchers and MAR system developers alike.
MMJan 14, 2021
AICP: Augmented Informative Cooperative PerceptionPengyuan Zhou, Pranvera Kortoci, Yui-Pan Yau et al.
Connected vehicles, whether equipped with advanced driver-assistance systems or fully autonomous, require human driver supervision and are currently constrained to visual information in their line-of-sight. A cooperative perception system among vehicles increases their situational awareness by extending their perception range. Existing solutions focus on improving perspective transformation and fast information collection. However, such solutions fail to filter out large amounts of less relevant data and thus impose significant network and computation load. Moreover, presenting all this less relevant data can overwhelm the driver and thus actually hinder them. To address such issues, we present Augmented Informative Cooperative Perception (AICP), the first fast-filtering system which optimizes the informativeness of shared data at vehicles to improve the fused presentation. To this end, an informativeness maximization problem is presented for vehicles to select a subset of data to display to their drivers. Specifically, we propose (i) a dedicated system design with custom data structure and lightweight routing protocol for convenient data encapsulation, fast interpretation and transmission, and (ii) a comprehensive problem formulation and efficient fitness-based sorting algorithm to select the most valuable data to display at the application layer. We implement a proof-of-concept prototype of AICP with a bandwidth-hungry, latency-constrained real-life augmented reality application. The prototype adds only 12.6 milliseconds of latency to a current informativeness-unaware system. Next, we test the networking performance of AICP at scale and show that ACIP effectively filters out less relevant packets and decreases the channel busy time.
HCNov 12, 2019
Emerging Natural User Interfaces in Mobile Computing: A Bottoms-Up SurveyKirill A. Shatilov, Dimitris Chatzopoulos, Lik-Hang Lee et al.
Mobile and wearable interfaces and interaction paradigms are highly constrained by the available screen real estate, and the computational and power resources. Although there exist many ways of displaying information to mobile users, inputting data to a mobile device is, usually, limited to a conventional touch based interaction, that distracts users from their ongoing activities. Furthermore, emerging applications, like augmented, mixed and virtual reality (AR/MR/VR), require new types of input methods in order to interact with complex virtual worlds, challenging the traditional techniques of Human-Computer Interaction (HCI). Leveraging of Natural User Interfaces (NUIs), as a paradigm of using natural intuitive actions to interact with computing systems, is one of many ways to meet these challenges in mobile computing and its modern applications. Brain-Machine Interfaces that enable thought-only hands-free interaction, Myoelectric input methods that track body gestures and gaze-tracking input interfaces - are the examples of NUIs applicable to mobile and wearable interactions. The wide adoption of wearable devices and the penetration of mobile technologies, alongside with the growing market of AR/MR/VR, motivates the exploration and implementation of new interaction paradigms. The concurrent development of bio-signal acquisition techniques and accompanying ecosystems offers a useful toolbox to address open challenges. In this survey, we present state-of-the-art bio-signal acquisition methods, summarize and evaluate recent developments in the area of NUIs and outline potential application in mobile scenarios. The survey will provide a bottoms-up overview starting from (i) underlying biological aspects and signal acquisition techniques, (ii) portable NUI hardware solutions, (iii) NUI-enabled applications, as well as (iv) research challenges and open problems.
HCJul 31, 2017
Interaction Methods for Smart GlassesLik-Hang Lee, Pan Hui
Since the launch of Google Glass in 2014, smart glasses have mainly been designed to support micro-interactions. The ultimate goal for them to become an augmented reality interface has not yet been attained due to an encumbrance of controls. Augmented reality involves superimposing interactive computer graphics images onto physical objects in the real world. This survey reviews current research issues in the area of human computer interaction for smart glasses. The survey first studies the smart glasses available in the market and afterwards investigates the interaction methods proposed in the wide body of literature. The interaction methods can be classified into hand-held, touch, and touchless input. This paper mainly focuses on the touch and touchless input. Touch input can be further divided into on-device and on-body, while touchless input can be classified into hands-free and freehand. Next, we summarize the existing research efforts and trends, in which touch and touchless input are evaluated by a total of eight interaction goals. Finally, we discuss several key design challenges and the possibility of multi-modal input for smart glasses.