CVDec 25, 2022
EVM-CNN: Real-Time Contactless Heart Rate Estimation from Facial VideoYing Qiu, Yang Liu, Juan Arteaga-Falconi et al.
With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth.
CVDec 25, 2022
A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusionNadia Figueroa, Haiwei Dong, Abdulmotaleb El Saddik
We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data into a global implicit surface. We evaluate our method on a publicly available RGB-D SLAM benchmark dataset by Sturm et al. The experimental results show that our proposed reconstruction method solely based on visual odometry and KinectFusion outperforms the state-of-the-art RGB-D SLAM system accuracy. Moreover, our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps.
CVDec 25, 2022
Learning to Estimate 3D Human Pose from Point CloudYufan Zhou, Haiwei Dong, Abdulmotaleb El Saddik
3D pose estimation is a challenging problem in computer vision. Most of the existing neural-network-based approaches address color or depth images through convolution networks (CNNs). In this paper, we study the task of 3D human pose estimation from depth images. Different from the existing CNN-based human pose estimation method, we propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures. We first cast the 3D human pose estimation from 2D depth images to 3D point clouds and directly predict the 3D joint position. Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods. The reported results on both ITOP and EVAL datasets demonstrate the effectiveness of our method on the targeted tasks.
SPDec 25, 2022
Sitting Posture Recognition Using a Spiking Neural NetworkJianquan Wang, Basim Hafidh, Haiwei Dong et al.
To increase the quality of citizens' lives, we designed a personalized smart chair system to recognize sitting behaviors. The system can receive surface pressure data from the designed sensor and provide feedback for guiding the user towards proper sitting postures. We used a liquid state machine and a logistic regression classifier to construct a spiking neural network for classifying 15 sitting postures. To allow this system to read our pressure data into the spiking neurons, we designed an algorithm to encode map-like data into cosine-rank sparsity data. The experimental results consisting of 15 sitting postures from 19 participants show that the prediction precision of our SNN is 88.52%.
NIJan 29, 2023
A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data CentersShiva Ketabi, Hongkai Chen, Haiwei Dong et al.
Various congestion control protocols have been designed to achieve high performance in different network environments. Modern online learning solutions that delegate the congestion control actions to a machine cannot properly converge in the stringent time scales of data centers. We leverage multiagent reinforcement learning to design a system for dynamic tuning of congestion control parameters at end-hosts in a data center. The system includes agents at the end-hosts to monitor and report the network and traffic states, and agents to run the reinforcement learning algorithm given the states. Based on the state of the environment, the system generates congestion control parameters that optimize network performance metrics such as throughput and latency. As a case study, we examine BBR, an example of a prominent recently-developed congestion control protocol. Our experiments demonstrate that the proposed system has the potential to mitigate the problems of static parameters.
SEMay 18
Supporting System Testing with a Multi-Agent LLM-based Framework for Knowledge Graph Extraction: A Case Study with Ethernet Switch SystemsRongqi Pan, Mahboubeh Dadkhah, Jean Baptiste Minani et al.
Technical documents contain rich domain knowledge for automating downstream tasks such as system testing. While this paper focuses on Ethernet switch configuration manuals (ESCMs), we propose a general framework that can be adapted to different industrial contexts. ESCMs provide valuable domain knowledge for Ethernet switch testing, but their semi-structured format, implicit step attributes, and complex section dependencies make them difficult to directly leverage for test automation. To address this, we generate knowledge graphs (KGs) that capture configuration knowledge from ESCM in a structured form. We propose a multi-agent LLM-based framework that extracts, evaluates, and improves KGs from ESCMs using a fine-grained KG schema and an iterative Extract-Evaluate-Improve (EEI) loop. Our evaluation on 50 real-world ESCMs shows that our framework achieves high extraction correctness using the original prompts, with average correctness scores ranging from 0.97 to 0.99 across three extraction tasks. For challenging ESCMs, the EEI loop further improves correctness through manual-specific prompt refinement. Moreover, the LLM judgments and human evaluations show substantial agreement, with Cohen's kappa of at least 0.72 across all extraction tasks. Finally, feedback from industry testers indicates that the generated KGs can support the generation of useful and correct test case specifications (TCSs) for downstream testing.
MMMay 13, 2024
MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint PredictionHaopeng Wang, Zijian Long, Haiwei Dong et al.
Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpoint prediction is severely limited by the inherent uncertainty in head movement, which can not cope with the sudden movement of users very well. This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory. The proposed method models viewpoint prediction as a classification problem and uses attention mechanisms to capture the spatial and temporal characteristics of input video frames and viewpoint trajectories for multi-viewpoint prediction. After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360° video streaming is proposed for maximizing different QoE objectives under various network conditions. We formulate the ABR problem as a decentralized partially observable Markov decision process (Dec-POMDP) problem and present a MAPPO algorithm based on centralized training and decentralized execution (CTDE) framework to solve the problem. The experimental results show that our proposed method improves the defined QoE metric by up to 85.5% compared to existing ABR methods.
MMMar 27, 2025
Immersive Multimedia Communication: State-of-the-Art on eXtended Reality StreamingHaopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik
Extended reality (XR) is rapidly advancing, and poised to revolutionize content creation and consumption. In XR, users integrate various sensory inputs to form a cohesive perception of the virtual environment. This survey reviews the state-of-the-art in XR streaming, focusing on multiple paradigms. To begin, we define XR and introduce various XR headsets along with their multimodal interaction methods to provide a foundational understanding. We then analyze XR traffic characteristics to highlight the unique data transmission requirements. We also explore factors that influence the quality of experience in XR systems, aiming to identify key elements for enhancing user satisfaction. Following this, we present visual attention-based optimization methods for XR streaming to improve efficiency and performance. Finally, we examine current applications and highlight challenges to provide insights into ongoing and future developments of XR.
LGJun 19, 2025
Adaptive Social Metaverse Streaming based on Federated Multi-Agent Deep Reinforcement LearningZijian Long, Haopeng Wang, Haiwei Dong et al.
The social metaverse is a growing digital ecosystem that blends virtual and physical worlds. It allows users to interact socially, work, shop, and enjoy entertainment. However, privacy remains a major challenge, as immersive interactions require continuous collection of biometric and behavioral data. At the same time, ensuring high-quality, low-latency streaming is difficult due to the demands of real-time interaction, immersive rendering, and bandwidth optimization. To address these issues, we propose ASMS (Adaptive Social Metaverse Streaming), a novel streaming system based on Federated Multi-Agent Proximal Policy Optimization (F-MAPPO). ASMS leverages F-MAPPO, which integrates federated learning (FL) and deep reinforcement learning (DRL) to dynamically adjust streaming bit rates while preserving user privacy. Experimental results show that ASMS improves user experience by at least 14% compared to existing streaming methods across various network conditions. Therefore, ASMS enhances the social metaverse experience by providing seamless and immersive streaming, even in dynamic and resource-constrained networks, while ensuring that sensitive user data remains on local devices.
IVMar 27, 2025
Empirical Studies of Large Scale Environment Scanning by Consumer ElectronicsMengyuan Wang, Yang Liu, Haopeng Wang et al.
This paper presents an empirical evaluation of the Matterport Pro3, a consumer-grade 3D scanning device, for large-scale environment reconstruction. We conduct detailed scanning (1,099 scanning points) of a six-floor building (17,567 square meters) and assess the device's effectiveness, limitations, and performance enhancements in diverse scenarios. Challenges encountered during the scanning are addressed through proposed solutions, while we also explore advanced methods to overcome them more effectively. Comparative analysis with another consumer-grade device (iPhone) highlights the Pro3's balance between cost-effectiveness and performance. The Matterport Pro3 achieves a denser point cloud with 1,877,324 points compared to the iPhone's 506,961 points and higher alignment accuracy with an RMSE of 0.0118 meters. The cloud-to-cloud (C2C) average distance error between the two point cloud models is 0.0408 meters, with a standard deviation of 0.0715 meters. The study demonstrates the Pro3's ability to generate high-quality 3D models suitable for large-scale applications, leveraging features such as LiDAR and advanced alignment techniques.
HCJan 27, 2025
MetaDecorator: Generating Immersive Virtual Tours through MultimodalityShuang Xie, Yang Liu, Jeannie S. A. Lee et al.
MetaDecorator, is a framework that empowers users to personalize virtual spaces. By leveraging text-driven prompts and image synthesis techniques, MetaDecorator adorns static panoramas captured by 360° imaging devices, transforming them into uniquely styled and visually appealing environments. This significantly enhances the realism and engagement of virtual tours compared to traditional offerings. Beyond the core framework, we also discuss the integration of Large Language Models (LLMs) and haptics in the VR application to provide a more immersive experience.
MMJan 22, 2025
Leveraging LLMs to Create a Haptic Devices' Recommendation SystemYang Liu, Haiwei Dong, Abdulmotaleb El Saddik
Haptic technology has seen significant growth, yet a lack of awareness of existing haptic device design knowledge hinders development. This paper addresses these limitations by leveraging advancements in Large Language Models (LLMs) to develop a haptic agent, focusing specifically on Grounded Force Feedback (GFF) devices recommendation. Our approach involves automating the creation of a structured haptic device database using information from research papers and product specifications. This database enables the recommendation of relevant GFF devices based on user queries. To ensure precise and contextually relevant recommendations, the system employs a dynamic retrieval method that combines both conditional and semantic searches. Benchmarking against the established UEQ and existing haptic device searching tools, the proposed haptic recommendation agent ranks in the top 10\% across all UEQ categories with mean differences favoring the agent in nearly all subscales, and maintains no significant performance bias across different user groups, showcasing superior usability and user satisfaction.
MMDec 23, 2023
Human-Centric Resource Allocation for the Metaverse With Multiaccess Edge ComputingZijian Long, Haiwei Dong, Abdulmotaleb El Saddik
Multi-access edge computing (MEC) is a promising solution to the computation-intensive, low-latency rendering tasks of the metaverse. However, how to optimally allocate limited communication and computation resources at the edge to a large number of users in the metaverse is quite challenging. In this paper, we propose an adaptive edge resource allocation method based on multi-agent soft actor-critic with graph convolutional networks (SAC-GCN). Specifically, SAC-GCN models the multi-user metaverse environment as a graph where each agent is denoted by a node. Each agent learns the interplay between agents by graph convolutional networks with self-attention mechanism to further determine the resource usage for one user in the metaverse. The effectiveness of SAC-GCN is demonstrated through the analysis of user experience, balance of resource allocation, and resource utilization rate by taking a virtual city park metaverse as an example. Experimental results indicate that SAC-GCN outperforms other resource allocation methods in improving overall user experience, balancing resource allocation, and increasing resource utilization rate by at least 27%, 11%, and 8%, respectively.