Xiaobo Zhou

h-index10

15papers

183citations

Novelty42%

AI Score46

Ranked #62,906 of 201,326 authors (top 31%)#22,672 in CV (top 38%)

15 Papers

CVNov 8, 2025Code

StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video

Zhihui Ke, Yuyang Liu, Xiaobo Zhou et al.

Streaming free-viewpoint video~(FVV) in real-time still faces significant challenges, particularly in training, rendering, and transmission efficiency. Harnessing superior performance of 3D Gaussian Splatting~(3DGS), recent 3DGS-based FVV methods have achieved notable breakthroughs in both training and rendering. However, the storage requirements of these methods can reach up to $10$MB per frame, making stream FVV in real-time impossible. To address this problem, we propose a novel FVV representation, dubbed StreamSTGS, designed for real-time streaming. StreamSTGS represents a dynamic scene using canonical 3D Gaussians, temporal features, and a deformation field. For high compression efficiency, we encode canonical Gaussian attributes as 2D images and temporal features as a video. This design not only enables real-time streaming, but also inherently supports adaptive bitrate control based on network condition without any extra training. Moreover, we propose a sliding window scheme to aggregate adjacent temporal features to learn local motions, and then introduce a transformer-guided auxiliary training module to learn global motions. On diverse FVV benchmarks, StreamSTGS demonstrates competitive performance on all metrics compared to state-of-the-art methods. Notably, StreamSTGS increases the PSNR by an average of $1$dB while reducing the average frame size to just $170$KB. The code is publicly available on https://github.com/kkkzh/StreamSTGS.

CVSep 19, 2024

FlexiTex: Enhancing Texture Generation via Visual Guidance

DaDong Jiang, Xianghui Yang, Zibo Zhao et al.

Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing global textural or shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically global consistency. Benefiting from the visual guidance, FlexiTex produces quantitatively and qualitatively sound results, demonstrating its potential to advance texture generation for real-world applications.

CVOct 25, 2023

4D-Editor: Interactive Object-level Editing in Dynamic Neural Radiance Fields via Semantic Distillation

Dadong Jiang, Zhihui Ke, Xiaobo Zhou et al.

This paper targets interactive object-level editing (e.g., deletion, recoloring, transformation, composition) in dynamic scenes. Recently, some methods aiming for flexible editing static scenes represented by neural radiance field (NeRF) have shown impressive synthesis quality, while similar capabilities in time-variant dynamic scenes remain limited. To solve this problem, we propose 4D-Editor, an interactive semantic-driven editing framework, allowing editing multiple objects in a dynamic NeRF with user strokes on a single frame. We propose an extension to the original dynamic NeRF by incorporating a hybrid semantic feature distillation to maintain spatial-temporal consistency after editing. In addition, we design Recursive Selection Refinement that significantly boosts object segmentation accuracy within a dynamic NeRF to aid the editing process. Moreover, we develop Multi-view Reprojection Inpainting to fill holes caused by incomplete scene capture after editing. Extensive experiments and editing examples on real-world demonstrate that 4D-Editor achieves photo-realistic editing on dynamic NeRFs. Project page: https://patrickddj.github.io/4D-Editor

CENov 11, 2025

CometNet: Contextual Motif-guided Long-term Time Series Forecasting

Weixu Wang, Xiaobo Zhou, Xin Qiao et al.

Long-term Time Series Forecasting is crucial across numerous critical domains, yet its accuracy remains fundamentally constrained by the receptive field bottleneck in existing models. Mainstream Transformer- and Multi-layer Perceptron (MLP)-based methods mainly rely on finite look-back windows, limiting their ability to model long-term dependencies and hurting forecasting performance. Naively extending the look-back window proves ineffective, as it not only introduces prohibitive computational complexity, but also drowns vital long-term dependencies in historical noise. To address these challenges, we propose CometNet, a novel Contextual Motif-guided Long-term Time Series Forecasting framework. CometNet first introduces a Contextual Motif Extraction module that identifies recurrent, dominant contextual motifs from complex historical sequences, providing extensive temporal dependencies far exceeding limited look-back windows; Subsequently, a Motif-guided Forecasting module is proposed, which integrates the extracted dominant motifs into forecasting. By dynamically mapping the look-back window to its relevant motifs, CometNet effectively harnesses their contextual information to strengthen long-term forecasting capability. Extensive experimental results on eight real-world datasets have demonstrated that CometNet significantly outperforms current state-of-the-art (SOTA) methods, particularly on extended forecast horizons.

CVMar 23, 2024

DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

Hao Yan, Zhihui Ke, Xiaobo Zhou et al.

Implicit neural representations for video (NeRV) have recently become a novel way for high-quality video representation. However, existing works employ a single network to represent the entire video, which implicitly confuse static and dynamic information. This leads to an inability to effectively compress the redundant static information and lack the explicitly modeling of global temporal-coherent dynamic details. To solve above problems, we propose DS-NeRV, which decomposes videos into sparse learnable static codes and dynamic codes without the need for explicit optical flow or residual supervision. By setting different sampling rates for two codes and applying weighted sum and interpolation sampling methods, DS-NeRV efficiently utilizes redundant static information while maintaining high-frequency details. Additionally, we design a cross-channel attention-based (CCA) fusion module to efficiently fuse these two codes for frame decoding. Our approach achieves a high quality reconstruction of 31.2 PSNR with only 0.35M parameters thanks to separate static and dynamic codes representation and outperforms existing NeRV methods in many downstream tasks. Our project website is at https://haoyan14.github.io/DS-NeRV.

QMJan 8, 2024

Advancing bioinformatics with large language models: components, applications and perspectives

Jiajia Liu, Mengyuan Yang, Yankai Yu et al.

Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics, spanning genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, the core attention mechanism, and the pre-training processes underlying these models. Additionally, we will introduce currently available foundation models and highlight their downstream applications across various bioinformatics domains. Finally, drawing from our experience, we will offer practical guidance for both LLM users and developers, emphasizing strategies to optimize their use and foster further innovation in the field.

CVDec 16, 2024

SitPose: Real-Time Detection of Sitting Posture and Sedentary Behavior Using Ensemble Learning With Depth Sensor

Hang Jin, Xin He, Lingyun Wang et al.

Poor sitting posture can lead to various work-related musculoskeletal disorders (WMSDs). Office employees spend approximately 81.8% of their working time seated, and sedentary behavior can result in chronic diseases such as cervical spondylosis and cardiovascular diseases. To address these health concerns, we present SitPose, a sitting posture and sedentary detection system utilizing the latest Kinect depth camera. The system tracks 3D coordinates of bone joint points in real-time and calculates the angle values of related joints. We established a dataset containing six different sitting postures and one standing posture, totaling 33,409 data points, by recruiting 36 participants. We applied several state-of-the-art machine learning algorithms to the dataset and compared their performance in recognizing the sitting poses. Our results show that the ensemble learning model based on the soft voting mechanism achieves the highest F1 score of 98.1%. Finally, we deployed the SitPose system based on this ensemble model to encourage better sitting posture and to reduce sedentary habits.

CVNov 18, 2024

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

DaDong Jiang, Zhihui Ke, Xiaobo Zhou et al.

Dynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer includes a Cross-Temporal Transformer Encoder, which adaptively learns the temporal relationships of deformable 3D Gaussians. Furthermore, we propose a two-stream optimization strategy that transfers the motion knowledge learned from TimeFormer to the base stream during the training phase. This allows us to remove TimeFormer during inference, thereby preserving the original rendering speed. Extensive experiments in the multi-view and monocular dynamic scenes validate qualitative and quantitative improvement brought by TimeFormer. Project Page: https://patrickddj.github.io/TimeFormer/

QMSep 16, 2025

Unleashing the power of computational insights in revealing the complexity of biological systems in the new era of spatial multi-omics

Zhiwei Fan, Tiangang Wang, Kexin Huang et al.

Recent advances in spatial omics technologies have revolutionized our ability to study biological systems with unprecedented resolution. By preserving the spatial context of molecular measurements, these methods enable comprehensive mapping of cellular heterogeneity, tissue architecture, and dynamic biological processes in developmental biology, neuroscience, oncology, and evolutionary studies. This review highlights a systematic overview of the continuous advancements in both technology and computational algorithms that are paving the way for a deeper, more systematic comprehension of the structure and mechanisms of mammalian tissues and organs by using spatial multi-omics. Our viewpoint demonstrates how advanced machine learning algorithms and multi-omics integrative modeling can decode complex biological processes, including the spatial organization and topological relationships of cells during organ development, as well as key molecular signatures and regulatory networks underlying tumorigenesis and metastasis. Finally, we outline future directions for technological innovation and modeling insights of spatial omics in precision medicine.

CVJul 8, 2025

GSVR: 2D Gaussian-based Video Representation for 800+ FPS with Hybrid Deformation Field

Zhizhuo Pang, Zhihui Ke, Xiaobo Zhou et al.

Implicit neural representations for video have been recognized as a novel and promising form of video representation. Existing works pay more attention to improving video reconstruction quality but little attention to the decoding speed. However, the high computation of convolutional network used in existing methods leads to low decoding speed. Moreover, these convolution-based video representation methods also suffer from long training time, about 14 seconds per frame to achieve 35+ PSNR on Bunny. To solve the above problems, we propose GSVR, a novel 2D Gaussian-based video representation, which achieves 800+ FPS and 35+ PSNR on Bunny, only needing a training time of $2$ seconds per frame. Specifically, we propose a hybrid deformation field to model the dynamics of the video, which combines two motion patterns, namely the tri-plane motion and the polynomial motion, to deal with the coupling of camera motion and object motion in the video. Furthermore, we propose a Dynamic-aware Time Slicing strategy to adaptively divide the video into multiple groups of pictures(GOP) based on the dynamic level of the video in order to handle large camera motion and non-rigid movements. Finally, we propose quantization-aware fine-tuning to avoid performance reduction after quantization and utilize image codecs to compress Gaussians to achieve a compact representation. Experiments on the Bunny and UVG datasets confirm that our method converges much faster than existing methods and also has 10x faster decoding speed compared to other methods. Our method has comparable performance in the video interpolation task to SOTA and attains better video compression performance than NeRV.

IVJul 28, 2021

AI assisted method for efficiently generating breast ultrasound screening reports

Shuang Ge, Qiongyu Ye, Wenquan Xie et al.

Background: Ultrasound is one of the preferred choices for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Aim: We proposed a new pipeline to automatically generate AI breast ultrasound screening reports based on ultrasound images, aiming to assist doctors in improving the efficiency of clinical screening and reducing repetitive report writing. Methods: AI was used to efficiently generate personalized breast ultrasound screening preliminary reports, especially for benign and normal cases which account for the majority. Based on the preliminary AI report, doctors then make simple adjustments or corrections to quickly generate the final report. The approach has been trained and tested using a database of 4809 breast tumor instances. Results: Experimental results indicate that this pipeline improves doctors' work efficiency by up to 90%, which greatly reduces repetitive work. Conclusion: Personalized report generation is more widely recognized by doctors in clinical practice compared with non-intelligent reports based on fixed templates or containing options to fill in the blanks.

ITMar 30, 2021

Intelligent Reflecting Surface for Wireless Communication Security and Privacy

Shihao Yan, Xiaobo Zhou, Derrick Wing Kwan Ng et al.

Intelligent reflection surface (IRS) is emerging as a promising technique for future wireless communications. Considering its excellent capability in customizing the channel conditions via energy-focusing and energy-nulling, it is an ideal technique for enhancing wireless communication security and privacy, through the theories of physical layer security and covert communications, respectively. In this article, we first present some results on applying IRS to improve the average secrecy rate in wiretap channels, to enable perfect communication covertness, and to deliberately create extra randomness in wireless propagations for hiding active wireless transmissions. Then, we identify multiple challenges for future research to fully unlock the benefits offered by IRS in the context of physical layer security and covert communications. With the aid of extensive numerical studies, we demonstrate the necessity of designing the amplitudes of the IRS elements in wireless communications with the consideration of security and privacy, where the optimal values are not always $1$ as commonly adopted in the literature. Furthermore, we reveal the tradeoff between the achievable secrecy performance and the estimation accuracy of the IRS's channel state information (CSI) at both the legitimate and malicious users, which presents the fundamental resource allocation challenge in the context of IRS-aided physical layer security. Finally, a passive channel estimation methodology exploiting deep neural networks and scene images is discussed as a potential solution to enabling CSI availability without utilizing resource-hungry pilots. This methodology serves as a visible pathway to significantly improving the covert communication rate in IRS-aided wireless networks.

CVFeb 16, 2019

DC-AL GAN: Pseudoprogression and True Tumor Progression of Glioblastoma Multiform Image Classification Based on DCGAN and AlexNet

Meiyu Li, Hailiang Tang, Michael D. Chan et al.

Pseudoprogression (PsP) occurs in 20-30% of patients with glioblastoma multiforme (GBM) after receiving the standard treatment. In the course of post-treatment magnetic resonance imaging (MRI), PsP exhibits similarities in shape and intensity to the true tumor progression (TTP) of GBM. So, these similarities pose challenges on the differentiation of these types of progression and hence the selection of the appropriate clinical treatment strategy. In this paper, we introduce DC-AL GAN, a novel feature learning method based on deep convolutional generative adversarial network (DCGAN) and AlexNet, to discriminate between PsP and TTP in MRI images. Due to the adversarial relationship between the generator and the discriminator of DCGAN, high-level discriminative features of PsP and TTP can be derived for the discriminator with AlexNet. Also, a feature fusion scheme is used to combine higher-layer features with lower-layer information, leading to more powerful features that are used for effectively discriminating between PsP and TTP. The experimental results show that DC-AL GAN achieves desirable PsP and TTP classification performance that is superior to other state-of-the-art methods.

LGJan 6, 2019

Efforts estimation of doctors annotating medical image

Yang Deng, Yao Sun, Yongpei Zhu et al.

Accurate annotation of medical image is the crucial step for image AI clinical application. However, annotating medical image will incur a great deal of annotation effort and expense due to its high complexity and needing experienced doctors. To alleviate annotation cost, some active learning methods are proposed. But such methods just cut the number of annotation candidates and do not study how many efforts the doctor will exactly take, which is not enough since even annotating a small amount of medical data will take a lot of time for the doctor. In this paper, we propose a new criterion to evaluate efforts of doctors annotating medical image. First, by coming active learning and U-shape network, we employ a suggestive annotation strategy to choose the most effective annotation candidates. Then we exploit a fine annotation platform to alleviate annotating efforts on each candidate and first utilize a new criterion to quantitatively calculate the efforts taken by doctors. In our work, we take MR brain tissue segmentation as an example to evaluate the proposed method. Extensive experiments on the well-known IBSR18 dataset and MRBrainS18 Challenge dataset show that, using proposed strategy, state-of-the-art segmentation performance can be achieved by using only 60% annotation candidates and annotation efforts can be alleviated by at least 44%, 44%, 47% on CSF, GM, WM separately.

CRMay 2, 2018

Energy-Efficient Wireless Powered Secure Transmission with Cooperative Jamming for Public Transportation

Linqing Gui, Feifei Bao, Xiaobo Zhou et al.

In this paper, wireless power transfer and cooperative jamming (CJ) are combined to enhance physical security in public transportation networks. First, a new secure system model with both fixed and mobile jammers is proposed to guarantee secrecy in the worst-case scenario. All jammers are endowed with energy harvesting (EH) capability. Following this, two CJ based schemes, namely B-CJ-SRM and B-CJ-TPM, are proposed, where SRM and TPM are short for secrecy rate maximization and transmit power minimization, respectively. They respectively maximize the secrecy rate (SR) with transmit power constraint and minimize the transmit power of the BS with SR constraint, by optimizing beamforming vector and artificial noise covariance matrix. To further reduce the complexity of our proposed optimal schemes, their low-complexity (LC) versions, called LC-B-CJ-SRM and LC-B-CJ-TPM are developed. Simulation results show that our proposed schemes, B-CJ-SRM and B-CJ-TPM, achieve significant SR performance improvement over existing zero-forcing and QoSD methods. Additionally, the SR performance of the proposed LC schemes are close to those of their original versions.