Haifeng Lin

CL
h-index1
4papers
30citations
Novelty46%
AI Score34

4 Papers

AIJul 8, 2024
Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

Detian Chu, Linyuan Bai, Jianuo Huang et al.

With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.

CVJul 24, 2025Code
VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding

Baoyao Yang, Wanyun Li, Dixin Chen et al.

This paper introduces VideoMind, a video-centric omni-modal dataset designed for deep video content cognition and enhanced multi-modal feature representation. The dataset comprises 103K video samples (3K reserved for testing), each paired with audio and systematically detailed textual descriptions. Specifically, every video and its audio is described across three hierarchical layers (factual, abstract, and intent), progressing from surface to depth. It contains over 22 million words, averaging ~225 words per sample. VideoMind's key distinction from existing datasets is its provision of intent expressions, which require contextual integration across the entire video and are not directly observable. These deep-cognitive expressions are generated using a Chain-of-Thought (COT) approach, prompting the mLLM through step-by-step reasoning. Each description includes annotations for subject, place, time, event, action, and intent, supporting downstream recognition tasks. Crucially, we establish a gold-standard benchmark with 3,000 manually validated samples for evaluating deep-cognitive video understanding. We design hybrid-cognitive retrieval experiments, scored by multi-level retrieval metrics, to appropriately assess deep video comprehension. Evaluation results for models (e.g., InternVideo, VAST, UMT-L) are released. VideoMind serves as a powerful benchmark for fine-grained cross-modal alignment and advances fields requiring in-depth video understanding, such as emotion and intent recognition. The data is publicly available on GitHub, HuggingFace, and OpenDataLab, https://github.com/cdx-cindy/VideoMind.

CLJan 25, 2025
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering

Beiming Liu, Zhizhuo Cui, Siteng Hu et al.

Aerospace manufacturing demands exceptionally high precision in technical parameters. The remarkable performance of Large Language Models (LLMs), such as GPT-4 and QWen, in Natural Language Processing has sparked industry interest in their application to tasks including process design, material selection, and tool information retrieval. However, LLMs are prone to generating "hallucinations" in specialized domains, producing inaccurate or false information that poses significant risks to the quality of aerospace products and flight safety. This paper introduces a set of evaluation metrics tailored for LLMs in aerospace manufacturing, aiming to assess their accuracy by analyzing their performance in answering questions grounded in professional knowledge. Firstly, key information is extracted through in-depth textual analysis of classic aerospace manufacturing textbooks and guidelines. Subsequently, utilizing LLM generation techniques, we meticulously construct multiple-choice questions with multiple correct answers of varying difficulty. Following this, different LLM models are employed to answer these questions, and their accuracy is recorded. Experimental results demonstrate that the capabilities of LLMs in aerospace professional knowledge are in urgent need of improvement. This study provides a theoretical foundation and practical guidance for the application of LLMs in aerospace manufacturing, addressing a critical gap in the field.

CLAug 24, 2020
Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III

Brent Biseda, Gaurav Desai, Haifeng Lin et al.

This paper achieves state of the art results for the ICD code prediction task using the MIMIC-III dataset. This was achieved through the use of Clinical BERT (Alsentzer et al., 2019). embeddings and text augmentation and label balancing to improve F1 scores for both ICD Chapter as well as ICD disease codes. We attribute the improved performance mainly to the use of novel text augmentation to shuffle the order of sentences during training. In comparison to the Top-32 ICD code prediction (Keyang Xu, et. al.) with an F1 score of 0.76, we achieve a final F1 score of 0.75 but on a total of the top 50 ICD codes.