Dingwen Li

LG
4papers
78citations
Novelty54%
AI Score41

4 Papers

CVSep 26, 2023
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation

Zeyu Wang, Dingwen Li, Chenxu Luo et al.

3D perception based on the representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry. However, there exists a distinct performance gap between multi-camera BEV and LiDAR based 3D object detection. One key reason is that LiDAR captures accurate depth and other geometry measurements, while it is notoriously challenging to infer such 3D information from merely image input. In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector. We propose effective balancing strategy to enforce the student to focus on learning the crucial features from the teacher, and generalize knowledge transfer to multi-scale layers with temporal fusion. We conduct extensive evaluations on multiple representative models of multi-camera BEV. Experiments reveal that our approach renders significant improvement over the student models, leading to the state-of-the-art performance on the popular benchmark nuScenes.

LGOct 10, 2022
Self-explaining Hierarchical Model for Intraoperative Time Series

Dingwen Li, Bing Xue, Christopher King et al.

Major postoperative complications are devastating to surgical patients. Some of these complications are potentially preventable via early predictions based on intraoperative data. However, intraoperative data comprise long and fine-grained multivariate time series, prohibiting the effective learning of accurate models. The large gaps associated with clinical events and protocols are usually ignored. Moreover, deep models generally lack transparency. Nevertheless, the interpretability is crucial to assist clinicians in planning for and delivering postoperative care and timely interventions. Towards this end, we propose a hierarchical model combining the strength of both attention and recurrent models for intraoperative time series. We further develop an explanation module for the hierarchical model to interpret the predictions by providing contributions of intraoperative data in a fine-grained manner. Experiments on a large dataset of 111,888 surgeries with multiple outcomes and an external high-resolution ICU dataset show that our model can achieve strong predictive performance (i.e., high accuracy) and offer robust interpretations (i.e., high transparency) for predicted outcomes based on intraoperative time series.

38.0CLMay 1
Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Charles Weng, Dingwen Li, Alexander Martin

Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when the binary label is constrained to a fixed output position, equivalent prompts can induce materially different unsafe probabilities for the same sample. Across multimodal safety benchmarks and multiple VLM families, cross-prompt variance is strongly associated with prompt-level disagreement and higher error, making it a useful fragility diagnostic. A training-free mean ensemble improves NLL on all 14 dataset-model evaluation pairs and ECE on 12/14 relative to a train-selected single-prompt baseline, and wins more head-to-head NLL comparisons than labeled temperature scaling, Platt scaling, and isotonic regression applied to the same prompt. Ranking gains are consistent against the train-selected baseline on both AUROC and AUPRC, and against the full 15-prompt distribution remain consistent on AUPRC while softening on AUROC. Labeled calibration on top of the mean provides further gains when labels are available, identifying prompt averaging as a strong label-free first stage rather than a replacement for calibration. We frame this as a reliability stress test for zero-shot VLM first-token safety scores and recommend prompt-family evaluation with mean aggregation as a standard label-free reliability baseline.

LGApr 30, 2021
Predicting Intraoperative Hypoxemia with Hybrid Inference Sequence Autoencoder Networks

Hanyang Liu, Michael C. Montana, Dingwen Li et al.

We present an end-to-end model using streaming physiological time series to predict near-term risk for hypoxemia, a rare, but life-threatening condition known to cause serious patient harm during surgery. Inspired by the fact that a hypoxemia event is defined based on a future sequence of low SpO2 (i.e., blood oxygen saturation) instances, we propose the hybrid inference network (hiNet) that makes hybrid inference on both future low SpO2 instances and hypoxemia outcomes. hiNet integrates 1) a joint sequence autoencoder that simultaneously optimizes a discriminative decoder for label prediction, and 2) two auxiliary decoders trained for data reconstruction and forecast, which seamlessly learn contextual latent representations that capture the transition from present states to future states. All decoders share a memory-based encoder that helps capture the global dynamics of patient measurement. For a large surgical cohort of 72,081 surgeries at a major academic medical center, our model outperforms strong baselines including the model used by the state-of-the-art hypoxemia prediction system. With its capability to make real-time predictions of near-term hypoxemic at clinically acceptable alarm rates, hiNet shows promise in improving clinical decision making and easing burden of perioperative care.