Jianhao Yu

CV
h-index34
3papers
34citations
Novelty33%
AI Score36

3 Papers

CVDec 27, 2024Code
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios

Jiaqi Fan, Jianhua Wu, Jincheng Gao et al.

Multimodal large language models (MLLMs) have shown satisfactory effects in many autonomous driving tasks. In this paper, MLLMs are utilized to solve joint semantic scene understanding and risk localization tasks, while only relying on front-view images. In the proposed MLLM-SUL framework, a dual-branch visual encoder is first designed to extract features from two resolutions, and rich visual information is conducive to the language model describing risk objects of different sizes accurately. Then for the language generation, LLaMA model is fine-tuned to predict scene descriptions, containing the type of driving scenario, actions of risk objects, and driving intentions and suggestions of ego-vehicle. Ultimately, a transformer-based network incorporating a regression token is trained to locate the risk objects. Extensive experiments on the existing DRAMA-ROLISP dataset and the extended DRAMA-SRIS dataset demonstrate that our method is efficient, surpassing many state-of-the-art image-based and video-based methods. Specifically, our method achieves 80.1% BLEU-1 score and 298.5% CIDEr score in the scene understanding task, and 59.6% accuracy in the localization task. Codes and datasets are available at https://github.com/fjq-tongji/MLLM-SUL.

NIMar 11
A Secure Splitting and Acceleration Strategy for TCP/QUIC in Interplanetary Networks

Jianhao Yu, Ye Li, Qingfang Jiang et al.

Interplanetary networks (IPNs) present unique challenges such as extreme delay, high loss, and frequent disruptions that severely degrade the performance of conventional transport protocols like Transmission Control Protocol (TCP) and Quick UDP Internet Connection (QUIC). To address these issues, we propose a secure transport acceleration strategy tailored for IPNs. This strategy is founded on our Non-Transparent Secure Proxy (NTSP) architecture, which enables connection splitting for end-to-end encrypted flows while preserving application layer security. Based on the NTSP, we design an IPN-aware transport policy that combines (i) a rate-based congestion control algorithm exploiting the pre-scheduled nature of deep-space links to achieve stable and efficient bandwidth utilization, and (ii) an adaptive packet-level forward error correction scheme to provide low-latency loss recovery without retransmissions. Furthermore, we introduce a theoretically grounded backpressure flow control mechanism, deriving an analytical model for optimal buffer sizing to mitigate rate mismatch and prevent bufferbloat. The strategy is implemented in a prototype system, PEPspace, and evaluated in representative Earth-Moon scenarios. Results show near-capacity and stable goodput and substantially improved delivery performance compared with TCP/QUIC variants and existing Performance Enhancing Proxies, while maintaining low latency and robust data delivery across intermittent links. The NTSP architecture is further discussed as a foundational framework for future unified IP/DTN architectures, bridging a key architectural gap in heterogeneous space networks.

CVDec 8, 2023
Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Jianhua Wu, Bingzhao Gao, Jincheng Gao et al.

With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.