Qi Wang

h-index20

4papers

145citations

Novelty51%

AI Score37

Ranked #93,090 of 194,257 authors (top 48%)#31,282 in CV (top 53%)

4 Papers

11.1LGJun 17, 2022

Holistic Transformer: A Joint Neural Network for Trajectory Prediction and Decision-Making of Autonomous Vehicles

Hongyu Hu, Qi Wang, Zhengguang Zhang et al.

Trajectory prediction and behavioral decision-making are two important tasks for autonomous vehicles that require good understanding of the environmental context; behavioral decisions are better made by referring to the outputs of trajectory predictions. However, most current solutions perform these two tasks separately. Therefore, a joint neural network that combines multiple cues is proposed and named as the holistic transformer to predict trajectories and make behavioral decisions simultaneously. To better explore the intrinsic relationships between cues, the network uses existing knowledge and adopts three kinds of attention mechanisms: the sparse multi-head type for reducing noise impact, feature selection sparse type for optimally using partial prior knowledge, and multi-head with sigmoid activation type for optimally using posteriori knowledge. Compared with other trajectory prediction models, the proposed model has better comprehensive performance and good interpretability. Perceptual noise robustness experiments demonstrate that the proposed model has good noise robustness. Thus, simultaneous trajectory prediction and behavioral decision-making combining multiple cues can reduce computational costs and enhance semantic relationships between scenes and agents.

4.1LGOct 9, 2025

A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization

Yiqin Lv, Zhiyu Mou, Miao Xu et al. · tsinghua

In online advertising, heterogeneous advertiser requirements give rise to numerous customized bidding tasks that are typically optimized independently, resulting in extensive computation and limited data efficiency. Multi-task learning offers a principled framework to train these tasks jointly through shared representations. However, existing multi-task optimization strategies are primarily guided by training dynamics and often generalize poorly in volatile bidding environments. To this end, we present Validation-Aligned Multi-task Optimization (VAMO), which adaptively assigns task weights based on the alignment between per-task training gradients and a held-out validation gradient, thereby steering updates toward validation improvement and better matching deployment objectives. We further equip the framework with a periodicity-aware temporal module and couple it with an advanced generative auto-bidding backbone to enhance cross-task transfer of seasonal structure and strengthen bidding performance. Meanwhile, we provide theoretical insights into the proposed method, e.g., convergence guarantee and alignment analysis. Extensive experiments on both simulated and large-scale real-world advertising systems consistently demonstrate significant improvements over typical baselines, illuminating the effectiveness of the proposed approach.

6.2CVApr 5, 2025

Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning

Yuyang Zhang, Baao Xie, Hu Zhu et al.

Gaussian Splatting (GS) has recently marked a significant advancement in 3D reconstruction, delivering both rapid rendering and high-quality results. However, existing 3DGS methods pose challenges in understanding underlying 3D semantics, which hinders model controllability and interpretability. To address it, we propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics via hierarchical disentangled representation learning (DRL). Specifically, the model employs a dual-branch architecture, consisting of a point cloud initialization branch and a triplane-Gaussian generation branch, to achieve coarse-grained disentanglement by separating 3D geometry and visual appearance features. Subsequently, fine-grained semantic representations within each modality are further discovered through DRL-based encoder-adapters. To our knowledge, this is the first work to achieve unsupervised interpretable 3DGS. Evaluations indicate that our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.

9.0CVApr 30, 2019

Anomaly Detection in Traffic Scenes via Spatial-aware Motion Reconstruction

Yuan Yuan, Dong Wang, Qi Wang

Anomaly detection from a driver's perspective when driving is important to autonomous vehicles. As a part of Advanced Driver Assistance Systems (ADAS), it can remind the driver about dangers timely. Compared with traditional studied scenes such as the university campus and market surveillance videos, it is difficult to detect abnormal event from a driver's perspective due to camera waggle, abidingly moving background, drastic change of vehicle velocity, etc. To tackle these specific problems, this paper proposes a spatial localization constrained sparse coding approach for anomaly detection in traffic scenes, which firstly measures the abnormality of motion orientation and magnitude respectively and then fuses these two aspects to obtain a robust detection result. The main contributions are threefold: 1) This work describes the motion orientation and magnitude of the object respectively in a new way, which is demonstrated to be better than the traditional motion descriptors. 2) The spatial localization of object is taken into account of the sparse reconstruction framework, which utilizes the scene's structural information and outperforms the conventional sparse coding methods. 3) Results of motion orientation and magnitude are adaptively weighted and fused by a Bayesian model, which makes the proposed method more robust and handle more kinds of abnormal events. The efficiency and effectiveness of the proposed method are validated by testing on nine difficult video sequences captured by ourselves. Observed from the experimental results, the proposed method is more effective and efficient than the popular competitors, and yields a higher performance.