Jiawei Hou

RO
h-index8
13papers
171citations
Novelty50%
AI Score38

13 Papers

CVFeb 25, 2023Code
SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Jiawei Hou, Qi Chen, Yurong Cheng et al.

Automatic underground parking has attracted considerable attention as the scope of autonomous driving expands. The auto-vehicle is supposed to obtain the environmental information, track its location, and build a reliable map of the scenario. Mainstream solutions consist of well-trained neural networks and simultaneous localization and mapping (SLAM) methods, which need numerous carefully labeled images and multiple sensor estimations. However, there is a lack of underground parking scenario datasets with multiple sensors and well-labeled images that support both SLAM tasks and perception tasks, such as semantic segmentation and parking slot detection. In this paper, we present SUPS, a simulated dataset for underground automatic parking, which supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images according to timestamps. We intend to cover the defect of existing datasets with the variability of environments and the diversity and accessibility of sensors in the virtual scene. Specifically, the dataset records frames from four surrounding fisheye cameras, two forward pinhole cameras, a depth camera, and data from LiDAR, inertial measurement unit (IMU), GNSS. Pixel-level semantic labels are provided for objects, especially ground signs such as arrows, parking lines, lanes, and speed bumps. Perception, 3D reconstruction, depth estimation, and SLAM, and other relative tasks are supported by our dataset. We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset. Finally, we open source our virtual 3D scene built based on Unity Engine and release our dataset at https://github.com/jarvishou829/SUPS.

CVMar 5, 2024
FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

Jiawei Hou, Xiaoyan Li, Wenhao Guan et al.

In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.

ROMay 30, 2025
Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments

Jiawei Hou, Xiangyang Xue, Taiping Zeng

Autonomous operation of service robotics in human-centric scenes remains challenging due to the need for understanding of changing environments and context-aware decision-making. While existing approaches like topological maps offer efficient spatial priors, they fail to model transient object relationships, whereas dense neural representations (e.g., NeRF) incur prohibitive computational costs. Inspired by the hierarchical scene representation and video scene graph generation works, we propose Hi-Dyna Graph, a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy. Our framework constructs a global topological graph from posed RGB-D inputs, encoding room-scale connectivity and large static objects (e.g., furniture), while environmental and egocentric cameras populate dynamic subgraphs with object position relations and human-object interaction patterns. A hybrid architecture is conducted by anchoring these subgraphs to the global topology using semantic and spatial constraints, enabling seamless updates as the environment evolves. An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances. We conduct complex experiments to demonstrate Hi-Dyna Grap's superior scene representation effectiveness. Real-world deployments validate the system's practicality with a mobile manipulator: robotics autonomously complete complex tasks with no further training or complex rewarding in a dynamic scene as cafeteria assistant. See https://anonymous.4open.science/r/Hi-Dyna-Graph-B326 for video demonstration and more details.

CVNov 24, 2025
DetAny4D: Detect Anything 4D Temporally in a Streaming RGB Video

Jiawei Hou, Shenghao Zhang, Can Wang et al.

Reliable 4D object detection, which refers to 3D object detection in streaming video, is crucial for perceiving and understanding the real world. Existing open-set 4D object detection methods typically make predictions on a frame-by-frame basis without modeling temporal consistency, or rely on complex multi-stage pipelines that are prone to error propagation across cascaded stages. Progress in this area has been hindered by the lack of large-scale datasets that capture continuous reliable 3D bounding box (b-box) annotations. To overcome these challenges, we first introduce DA4D, a large-scale 4D detection dataset containing over 280k sequences with high-quality b-box annotations collected under diverse conditions. Building on DA4D, we propose DetAny4D, an open-set end-to-end framework that predicts 3D b-boxes directly from sequential inputs. DetAny4D fuses multi-modal features from pre-trained foundational models and designs a geometry-aware spatiotemporal decoder to effectively capture both spatial and temporal dynamics. Furthermore, it adopts a multi-task learning architecture coupled with a dedicated training strategy to maintain global consistency across sequences of varying lengths. Extensive experiments show that DetAny4D achieves competitive detection accuracy and significantly improves temporal stability, effectively addressing long-standing issues of jitter and inconsistency in 4D object detection. Data and code will be released upon acceptance.

RONov 16, 2021
Hierarchical Topometric Representation of 3D Robotic Maps

Zhenpeng He, Hao Sun, Jiawei Hou et al.

In this paper, we propose a method for generating a hierarchical, volumetric topological map from 3D point clouds. There are three basic hierarchical levels in our map: $storey - region - volume$. The advantages of our method are reflected in both input and output. In terms of input, we accept multi-storey point clouds and building structures with sloping roofs or ceilings. In terms of output, we can generate results with metric information of different dimensionality, that are suitable for different robotics applications. The algorithm generates the volumetric representation by generating $volumes$ from a 3D voxel occupancy map. We then add $passage$s (connections between $volumes$), combine small $volumes$ into a big $region$ and use a 2D segmentation method for better topological representation. We evaluate our method on several freely available datasets. The experiments highlight the advantages of our approach.

IRNov 16, 2021
QA4PRF: A Question Answering based Framework for Pseudo Relevance Feedback

Handong Ma, Jiawei Hou, Chenxu Zhu et al.

Pseudo relevance feedback (PRF) automatically performs query expansion based on top-retrieved documents to better represent the user's information need so as to improve the search results. Previous PRF methods mainly select expansion terms with high occurrence frequency in top-retrieved documents or with high semantic similarity with the original query. However, existing PRF methods hardly try to understand the content of documents, which is very important in performing effective query expansion to reveal the user's information need. In this paper, we propose a QA-based framework for PRF called QA4PRF to utilize contextual information in documents. In such a framework, we formulate PRF as a QA task, where the query and each top-retrieved document play the roles of question and context in the corresponding QA system, while the objective is to find some proper terms to expand the original query by utilizing contextual information, which are similar answers in QA task. Besides, an attention-based pointer network is built on understanding the content of top-retrieved documents and selecting the terms to represent the original query better. We also show that incorporating the traditional supervised learning methods, such as LambdaRank, to integrate PRF information will further improve the performance of QA4PRF. Extensive experiments on three real-world datasets demonstrate that QA4PRF significantly outperforms the state-of-the-art methods.

IRNov 1, 2020
U-rank: Utility-oriented Learning to Rank with Implicit Feedback

Xinyi Dai, Jiawei Hou, Qing Liu et al.

Learning to rank with implicit feedback is one of the most important tasks in many real-world information systems where the objective is some specific utility, e.g., clicks and revenue. However, we point out that existing methods based on probabilistic ranking principle do not necessarily achieve the highest utility. To this end, we propose a novel ranking framework called U-rank that directly optimizes the expected utility of the ranking list. With a position-aware deep click-through rate prediction model, we address the attention bias considering both query-level and item-level features. Due to the item-specific attention bias modeling, the optimization for expected utility corresponds to a maximum weight matching on the item-position bipartite graph. We base the optimization of this objective in an efficient Lambdaloss framework, which is supported by both theoretical and empirical analysis. We conduct extensive experiments for both web search and recommender systems over three benchmark datasets and two proprietary datasets, where the performance gain of U-rank over state-of-the-arts is demonstrated. Moreover, our proposed U-rank has been deployed on a large-scale commercial recommender and a large improvement over the production baseline has been observed in an online A/B testing.

ROMar 11, 2020
Self-supervised Point Set Local Descriptors for Point Cloud Registration

Yijun Yuan, Jiawei Hou, Andreas Nüchter et al.

In this work, we propose to learn local descriptors for point clouds in a self-supervised manner. In each iteration of the training, the input of the network is merely one unlabeled point cloud. On top of our previous work, that directly solves the transformation between two point sets in one step without correspondences, the proposed method is able to train from one point cloud, by supervising its self-rotation, that we randomly generate. The whole training requires no manual annotation. In several experiments we evaluate the performance of our method on various datasets and compare to other state of the art algorithms. The results show, that our self-supervised learned descriptor achieves equivalent or even better performance than the supervised learned model, while being easier to train and not requiring labeled data.

RONov 18, 2019
Fast 2D Map Matching Based on Area Graphs

Jiawei Hou, Haofei Kuang, Sören Schwertfeger

We present a novel area matching algorithm for merging two different 2D grid maps. There are many approaches to address this problem, nevertheless, most previous work is built on some assumptions, such as rigid transformation, or similar scale and modalities of two maps. In this work we propose a 2D map matching algorithm based on area segmentation. We transfer general 2D occupancy grid maps to an area graph representation, then compute the correct results by voting in that space. In the experiments, we compare with a state-of-the-art method applied to the matching of sensor maps with ground truth layout maps. The experiment shows that our algorithm has a better performance on large-scale maps and a faster computation speed.

RONov 2, 2019
Furniture Free Mapping using 3D Lidars

Zhenpeng He, Jiawei Hou, Sören Schwertfeger

Mobile robots depend on maps for localization, planning, and other applications. In indoor scenarios, there is often lots of clutter present, such as chairs, tables, other furniture, or plants. While mapping this clutter is important for certain applications, for example navigation, maps that represent just the immobile parts of the environment, i.e. walls, are needed for other applications, like room segmentation or long-term localization. In literature, approaches can be found that use a complete point cloud to remove the furniture in the room and generate a furniture free map. In contrast, we propose a Simultaneous Localization And Mapping (SLAM)-based mobile laser scanning solution. The robot uses an orthogonal pair of Lidars. The horizontal scanner aims to estimate the robot position, whereas the vertical scanner generates the furniture free map. There are three steps in our method: point cloud rearrangement, wall plane detection and semantic labeling. In the experiment, we evaluate the efficiency of removing furniture in a typical indoor environment. We get $99.60\%$ precision in keeping the wall in the 3D result, which shows that our algorithm can remove most of the furniture in the environment. Furthermore, we introduce the application of 2D furniture free mapping for room segmentation.

ROOct 1, 2019
Area Graph: Generation of Topological Maps using the Voronoi Diagram

Jiawei Hou, Yijun Yuan, Sören Schwertfeger

Representing a scanned map of the real environment as a topological structure is an important research topic in robotics. Since topological representations of maps save a huge amount of map storage space and online computing time, they are widely used in fields such as path planning, map matching, and semantic mapping. We use a topological map representation, the Area Graph, in which the vertices represent areas and edges represent passages. The Area Graph is developed from a pruned Voronoi Graph, the Topology Graph. We also employ a simple room detection algorithm to compensate the fact that the Voronoi Graph gets unstable in open areas. We claim that our area segmentation method is superior to state-of-the-art approaches in complex indoor environments and support this claim with a number of experiments.

ROSep 23, 2019
Improving CNN-based Planar Object Detection with Geometric Prior Knowledge

Jianxiong Cai, Jiawei Hou, Yiren Lu et al.

In this paper, we focus on the question: how might mobile robots take advantage of affordable RGB-D sensors for object detection? Although current CNN-based object detectors have achieved impressive results, there are three main drawbacks for practical usage on mobile robots: 1) It is hard and time-consuming to collect and annotate large-scale training sets. 2) It usually needs a long training time. 3) CNN-based object detection shows significant weakness in predicting location. We propose an improved method for the detection of planar objects, which rectifies images with geometric information to compensate for the perspective distortion before feeding it to the CNN detector module, typically a CNN-based detector like YOLO or MASK RCNN. By dealing with the perspective distortion in advance, we eliminate the need for the CNN detector to learn that. Experiments show that this approach significantly boosts the detection performance. Besides, it effectively reduces the number of training images required. In addition to the novel detection framework proposed, we also release an RGBD dataset and source code for hazmat sign detection. To the best of our knowledge, this is the first work of image rectification for CNN-based object detection, and the dataset is the first public available hazmat sign detection dataset with RGB-D sensors.

RONov 13, 2018
Topological Area Graph Generation and its Application to Path Planning

Jiawei Hou, Yijun Yuan, Sören Schwertfeger

Representing a scanned map of the real environment as a topological structure is an important research in robotics. %is currently an important research. Since topological representations of maps save a huge amount of map storage space and online computing time, they are widely used in fields such as path planning, map matching, and semantic mapping. We propose a novel topological map representation, the Area Graph, in which the vertices represent areas and edges represent passages. The Area Graph is developed from a pruned Voronoi Graph, the Topology Graph. The paper also presents path planning as one application for the Area Graph. For that, we derive a so-called Passage Graph from the Area Graph. Because our algorithm segments the map as a set of areas, the first experiment compares the results of the Area Graph with that of state-of-the-art segmentation approaches, which proved that our method effectively prevented over-segmentation. Then the second experiment shows the superiority of our method over the traditional A* planning algorithm.