Guoyou Wang

CV
h-index1
4papers
398citations
Novelty59%
AI Score43

4 Papers

CVOct 25, 2025
LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

Yuhang Gao, Xiang Xiang, Sheng Zhong et al.

Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition without dense pixel-level supervision, and our framework can also leverage existing ground truth to further improve performance. Our model predicts dense voxel features embedded in the CLIP feature space, integrating textual and image pixel information, and classifies based on text and semantic similarity. Experiments on the nuScenes dataset demonstrate the method's superior performance, achieving high-precision predictions for known classes and distinguishing unknown classes without additional training data.

LGJul 13, 2025
Discrete Differential Principle for Continuous Smooth Function Representation

Guoyou Wang, Yihua Tan, Shiqi Liu

Taylor's formula holds significant importance in function representation, such as solving differential difference equations, ordinary differential equations, partial differential equations, and further promotes applications in visual perception, complex control, fluid mechanics, weather forecasting and thermodynamics. However, the Taylor's formula suffers from the curse of dimensionality and error propagation during derivative computation in discrete situations. In this paper, we propose a new discrete differential operator to estimate derivatives and to represent continuous smooth function locally using the Vandermonde coefficient matrix derived from truncated Taylor series. Our method simultaneously computes all derivatives of orders less than the number of sample points, inherently mitigating error propagation. Utilizing equidistant uniform sampling, it achieves high-order accuracy while alleviating the curse of dimensionality. We mathematically establish rigorous error bounds for both derivative estimation and function representation, demonstrating tighter bounds for lower-order derivatives. We extend our method to the two-dimensional case, enabling its use for multivariate derivative calculations. Experiments demonstrate the effectiveness and superiority of the proposed method compared to the finite forward difference method for derivative estimation and cubic spline and linear interpolation for function representation. Consequently, our technique offers broad applicability across domains such as vision representation, feature extraction, fluid mechanics, and cross-media imaging.

CVFeb 12, 2020
Progressive Object Transfer Detection

Hao Chen, Yali Wang, Guoyou Wang et al.

Recent development of object detection mainly depends on deep learning with large-scale benchmarks. However, collecting such fully-annotated data is often difficult or expensive for real-world applications, which restricts the power of deep neural networks in practice. Alternatively, humans can detect new objects with little annotation burden, since humans often use the prior knowledge to identify new objects with few elaborately-annotated examples, and subsequently generalize this capacity by exploiting objects from wild images. Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework. Specifically, we make three main contributions in this paper. First, POTD can leverage various object supervision of different domains effectively into a progressive detection procedure. Via such human-like learning, one can boost a target detection task with few annotations. Second, POTD consists of two delicate transfer stages, i.e., Low-Shot Transfer Detection (LSTD), and Weakly-Supervised Transfer Detection (WSTD). In LSTD, we distill the implicit object knowledge of source detector to enhance target detector with few annotations. It can effectively warm up WSTD later on. In WSTD, we design a recurrent object labelling mechanism for learning to annotate weakly-labeled images. More importantly, we exploit the reliable object supervision from LSTD, which can further enhance the robustness of target detector in the WSTD stage. Finally, we perform extensive experiments on a number of challenging detection benchmarks with different settings. The results demonstrate that, our POTD outperforms the recent state-of-the-art approaches.

CVMar 5, 2018
LSTD: A Low-Shot Transfer Detector for Object Detection

Hao Chen, Yali Wang, Guoyou Wang et al.

Recent advances in object detection are mainly driven by deep learning with large-scale detection benchmarks. However, the fully-annotated training set is often limited for a target detection task, which may deteriorate the performance of deep detectors. To address this challenge, we propose a novel low-shot transfer detector (LSTD) in this paper, where we leverage rich source-domain knowledge to construct an effective target-domain detector with very few training examples. The main contributions are described as follows. First, we design a flexible deep architecture of LSTD to alleviate transfer difficulties in low-shot detection. This architecture can integrate the advantages of both SSD and Faster RCNN in a unified deep framework. Second, we introduce a novel regularized transfer learning framework for low-shot detection, where the transfer knowledge (TK) and background depression (BD) regularizations are proposed to leverage object knowledge respectively from source and target domains, in order to further enhance fine-tuning with a few target images. Finally, we examine our LSTD on a number of challenging low-shot detection experiments, where LSTD outperforms other state-of-the-art approaches. The results demonstrate that LSTD is a preferable deep detector for low-shot scenarios.