ROMay 21
Learning A Unified Risk Map for Autonomous Driving in Partially Observable EnvironmentsJie Jia, Yaofeng Su, Zeyu Bao et al.
Occlusion-aware prediction remains a critical challenge in autonomous driving due to the inherent uncertainty of unobserved regions. Existing approaches either overestimate risk based on reachable states or struggle to predict accurate trajectories under high occlusion uncertainty. To address these limitations, we propose a unified risk map modeling and learning framework for partially observable environments. Our method integrates traffic flow risk and collision risk through spatiotemporal modeling, enabling fine-grained assessment of occlusion-induced hazards. To address the scarcity of scenarios involving occluded interactions, we introduce a diffusion-based scenario generation framework that produces realistic yet adversarial scenarios. We integrate the modeling and learning of a unified risk map into a framework that supports risk-aware planning under partial observability. Experiments on the Waymo Open Motion Dataset show that our method significantly outperforms the state-of-the-art occlusion-aware baseline, improving minimum time-to-collision by 0.78 times and average time-to-collision by 1.67 times. The proposed framework offers a comprehensive and practical solution for risk-aware planning in partially observable environments.
SEMay 18
ProcBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding AgentsJiawei He, Jie Jia, Chenbo Liu et al.
Existing benchmarks for LLM coding agents mainly evaluate final outcomes, such as task completion, compilation success, and test pass rates. While these metrics are useful for measuring end-task capability, they provide limited visibility into how an execution unfolds and often miss recurrent process-level failures that arise during multi-step operation. We present ProcBench, a benchmark-oriented framework for evaluating coding-agent trajectories through process defects and control preservation. ProcBench organizes execution failures into a reusable ontology, standardizes heterogeneous logs into a unified trajectory representation, and reports calibrated risk-based scorecards instead of relying only on final outcomes. We instantiate ProcBench on an annotated set of 200 trajectories and apply it across three coding-agent benchmarks: AndroidBench, TerminalBench, and SWE-bench-Verified. Our results suggest that ProcBench can be instantiated with useful reliability, that calibration improves the empirical interpretability of defect findings relative to direct thresholding, and that process-aware scorecards provide diagnostic distinctions beyond conventional outcome-based evaluation. We also discuss limitations, including annotation dependence, partial observability for some defect classes, and the need for broader external validation.
CVApr 10, 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit RepresentationMuer Tie, Julong Wei, Zhengjun Wang et al.
Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive mapping. However, implementing open-vocabulary scene understanding capability into online neural implicit mapping still faces three challenges: lack of local scene updating ability, blurry spatial hierarchical semantic segmentation and difficulty in maintaining multi-view consistency. To this end, we proposed O2V-mapping, which utilizes voxel-based language and geometric features to create an open-vocabulary field, thus allowing for local updates during online training process. Additionally, we leverage a foundational model for image segmentation to extract language features on object-level entities, achieving clear segmentation boundaries and hierarchical semantic features. For the purpose of preserving consistency in 3D object properties across different viewpoints, we propose a spatial adaptive voxel adjustment mechanism and a multi-view weight selection method. Extensive experiments on open-vocabulary object localization and semantic segmentation demonstrate that O2V-mapping achieves online construction of language scenes while enhancing accuracy, outperforming the previous SOTA method.
SPSep 24, 2021
Indoor Localization Using Smartphone Magnetic with Multi-Scale TCN and LSTMMingyang Zhang, Jie Jia, Jian Chen
A novel multi-scale temporal convolutional network (TCN) and long short-term memory network (LSTM) based magnetic localization approach is proposed. To enhance the discernibility of geomagnetic signals, the time-series preprocessing approach is constructed at first. Next, the TCN is invoked to expand the feature dimensions on the basis of keeping the time-series characteristics of LSTM model. Then, a multi-scale time-series layer is constructed with multiple TCNs of different dilation factors to address the problem of inconsistent time-series speed between localization model and mobile users. A stacking framework of multi-scale TCN and LSTM is eventually proposed for indoor magnetic localization. Experiment results demonstrate the effectiveness of the proposed algorithm in indoor localization.
ASFeb 4, 2021
VSEGAN: Visual Speech Enhancement Generative Adversarial NetworkXinmeng Xu, Yang Wang, Dongxiang Xu et al.
Speech enhancement is an essential task of improving speech quality in noise scenario. Several state-of-the-art approaches have introduced visual information for speech enhancement,since the visual aspect of speech is essentially unaffected by acoustic environment. This paper proposes a novel frameworkthat involves visual information for speech enhancement, by in-corporating a Generative Adversarial Network (GAN). In par-ticular, the proposed visual speech enhancement GAN consistof two networks trained in adversarial manner, i) a generator that adopts multi-layer feature fusion convolution network to enhance input noisy speech, and ii) a discriminator that attemptsto minimize the discrepancy between the distributions of the clean speech signal and enhanced speech signal. Experiment re-sults demonstrated superior performance of the proposed modelagainst several state-of-the-art
HCMay 14, 2018
BCI-Based Strategies on Stroke Rehabilitation with Avatar and FES FeedbackZhaoyang Qiu, Shugeng Chen, Ian Daly et al.
Stroke is the leading cause of serious and long-term disability worldwide. Some studies have shown that motor imagery (MI) based BCI has a positive effect in poststroke rehabilitation. It could help patients promote the reorganization processes in the damaged brain regions. However, offline motor imagery and conventional online motor imagery with feedback (such as rewarding sounds and movements of an avatar) could not reflect the true intention of the patients. In this study, both virtual limbs and functional electrical stimulation (FES) were used as feedback to provide patients a closed-loop sensorimotor integration for motor rehabilitation. The FES system would activate if the user was imagining hand movement of instructed side. Ten stroke patients (7 male, aged 22-70 years, mean 49.5+-15.1) were involved in this study. All of them participated in BCI-FES rehabilitation training for 4 weeks.The average motor imagery accuracies of the ten patients in the last week were 71.3%, which has improved 3% than that in the first week. Five patients' Fugl-Meyer Assessment (FMA) scores have been raised. Patient 6, who has have suffered from stroke over two years, achieved the greatest improvement after rehabilitation training (pre FMA: 20, post FMA: 35). In the aspect of brain patterns, the active patterns of the five patients gradually became centralized and shifted to sensorimotor areas (channel C3 and C4) and premotor area (channel FC3 and FC4).In this study, motor imagery based BCI and FES system were combined to provided stoke patients with a closed-loop sensorimotor integration for motor rehabilitation. Result showed evidences that the BCI-FES system is effective in restoring upper extremities motor function in stroke. In future work, more cases are needed to demonstrate its superiority over conventional therapy and explore the potential role of MI in poststroke rehabilitation.
LGApr 26, 2018
Generative Model for Heterogeneous InferenceHonggang Zhou, Yunchun Li, Hailong Yang et al.
Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we've also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results.
LGDec 31, 2017
Using Deep Neural Network Approximate Bayesian NetworkJie Jia, Honggang Zhou, Yunchun Li
We present a new method to approximate posterior probabilities of Bayesian Network using Deep Neural Network. Experiment results on several public Bayesian Network datasets shows that Deep Neural Network is capable of learning joint probability distri- bution of Bayesian Network by learning from a few observation and posterior probability distribution pairs with high accuracy. Compared with traditional approximate method likelihood weighting sampling algorithm, our method is much faster and gains higher accuracy in medium sized Bayesian Network. Another advantage of our method is that our method can be parallelled much easier in GPU without extra effort. We also ex- plored the connection between the accuracy of our model and the number of training examples. The result shows that our model saturate as the number of training examples grow and we don't need many training examples to get reasonably good result. Another contribution of our work is that we have shown discriminative model like Deep Neural Network can approximate generative model like Bayesian Network.