Jiachen Zhong

h-index6

6papers

120citations

Novelty38%

AI Score35

Ranked #104,785 of 194,257 authors (top 54%)#35,096 in CV (top 59%)

6 Papers

20.3CVMar 30, 2023Code

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving

Zijian Zhu, Yichi Zhang, Hai Chen et al.

3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks. However, there still lacks a systematic understanding of the robustness of these vision-dependent BEV models, which is closely related to the safety of autonomous driving systems. In this paper, we evaluate the natural and adversarial robustness of various representative models under extensive settings, to fully understand their behaviors influenced by explicit BEV features compared with those without BEV. In addition to the classic settings, we propose a 3D consistent patch attack by applying adversarial patches in the 3D space to guarantee the spatiotemporal consistency, which is more realistic for the scenario of autonomous driving. With substantial experiments, we draw several findings: 1) BEV models tend to be more stable than previous methods under different natural conditions and common corruptions due to the expressive spatial representations; 2) BEV models are more vulnerable to adversarial noises, mainly caused by the redundant BEV features; 3) Camera-LiDAR fusion models have superior performance under different settings with multi-modal inputs, but BEV fusion model is still vulnerable to adversarial noises of both point cloud and image. These findings alert the safety issue in the applications of BEV detectors and could facilitate the development of more robust models.

9.4CVSep 23, 2022Code

WS-3D-Lane: Weakly Supervised 3D Lane Detection With 2D Lane Labels

Jianyong Ai, Wenbo Ding, Jiuhua Zhao et al.

Compared to 2D lanes, real 3D lane data is difficult to collect accurately. In this paper, we propose a novel method for training 3D lanes with only 2D lane labels, called weakly supervised 3D lane detection WS-3D-Lane. By assumptions of constant lane width and equal height on adjacent lanes, we indirectly supervise 3D lane heights in the training. To overcome the problem of the dynamic change of the camera pitch during data collection, a camera pitch self-calibration method is proposed. In anchor representation, we propose a double-layer anchor with a improved non-maximum suppression (NMS) method, which enables the anchor-based method to predict two lane lines that are close. Experiments are conducted on the base of 3D-LaneNet under two supervision methods. Under weakly supervised setting, our WS-3D-Lane outperforms previous 3D-LaneNet: F-score rises to 92.3% on Apollo 3D synthetic dataset, and F1 rises to 74.5% on ONCE-3DLanes. Meanwhile, WS-3D-Lane in purely supervised setting makes more increments and outperforms state-of-the-art. To the best of our knowledge, WS-3D-Lane is the first try of 3D lane detection under weakly supervised setting.

8.4CVJun 20, 2023

Multi-Scale Occ: 4th Place Solution for CVPR 2023 3D Occupancy Prediction Challenge

Yangyang Ding, Luying Huang, Jiachen Zhong

In this report, we present the 4th place solution for CVPR 2023 3D occupancy prediction challenge. We propose a simple method called Multi-Scale Occ for occupancy prediction based on lift-splat-shoot framework, which introduces multi-scale image features for generating better multi-scale 3D voxel features with temporal fusion of multiple past frames. Post-processing including model ensemble, test-time augmentation, and class-wise thresh are adopted to further boost the final performance. As shown on the leaderboard, our proposed occupancy prediction method ranks the 4th place with 49.36 mIoU.

2.0CVDec 13, 2023Code

DualTeacher: Bridging Coexistence of Unlabelled Classes for Semi-supervised Incremental Object Detection

Ziqi Yuan, Liyuan Wang, Wenbo Ding et al.

In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively. Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data. However, as supervisory signals are usually rare and expensive, the supervised IOD may not be practical for implementation. In this work, we consider a more realistic setting named semi-supervised IOD (SSIOD), where the object detector needs to learn new classes incrementally from a few labelled data and massive unlabelled data without catastrophic forgetting of old classes. A commonly-used strategy for supervised IOD is to encourage the current model (as a student) to mimic the behavior of the old model (as a teacher), but it generally fails in SSIOD because a dominant number of object instances from old and new classes are coexisting and unlabelled, with the teacher only recognizing a fraction of them. Observing that learning only the classes of interest tends to preclude detection of other classes, we propose to bridge the coexistence of unlabelled classes by constructing two teacher models respectively for old and new classes, and using the concatenation of their predictions to instruct the student. This approach is referred to as DualTeacher, which can serve as a strong baseline for SSIOD with limited resource overhead and no extra hyperparameters. We build various benchmarks for SSIOD and perform extensive experiments to demonstrate the superiority of our approach (e.g., the performance lead is up to 18.28 AP on MS-COCO). Our code is available at \url{https://github.com/chuxiuhong/DualTeacher}.

15.2IVJun 8, 2025

A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning

Jiachen Zhong, Yiting Wang, Di Zhu et al.

Lung cancer remains one of the most prevalent and fatal diseases worldwide, demanding accurate and timely diagnosis and treatment. Recent advancements in large AI models have significantly enhanced medical image understanding and clinical decision-making. This review systematically surveys the state-of-the-art in applying large AI models to lung cancer screening, diagnosis, prognosis, and treatment. We categorize existing models into modality-specific encoders, encoder-decoder frameworks, and joint encoder architectures, highlighting key examples such as CLIP, BLIP, Flamingo, BioViL-T, and GLoRIA. We further examine their performance in multimodal learning tasks using benchmark datasets like LIDC-IDRI, NLST, and MIMIC-CXR. Applications span pulmonary nodule detection, gene mutation prediction, multi-omics integration, and personalized treatment planning, with emerging evidence of clinical deployment and validation. Finally, we discuss current limitations in generalizability, interpretability, and regulatory compliance, proposing future directions for building scalable, explainable, and clinically integrated AI systems. Our review underscores the transformative potential of large AI models to personalize and optimize lung cancer care.

7.9LGAug 7, 2020Code

Improving the Speed and Quality of GAN by Adversarial Training

Jiachen Zhong, Xuanqing Liu, Cho-Jui Hsieh

Generative adversarial networks (GAN) have shown remarkable results in image generation tasks. High fidelity class-conditional GAN methods often rely on stabilization techniques by constraining the global Lipschitz continuity. Such regularization leads to less expressive models and slower convergence speed; other techniques, such as the large batch training, require unconventional computing power and are not widely accessible. In this paper, we develop an efficient algorithm, namely FastGAN (Free AdverSarial Training), to improve the speed and quality of GAN training based on the adversarial training technique. We benchmark our method on CIFAR10, a subset of ImageNet, and the full ImageNet datasets. We choose strong baselines such as SNGAN and SAGAN; the results demonstrate that our training algorithm can achieve better generation quality (in terms of the Inception score and Frechet Inception distance) with less overall training time. Most notably, our training algorithm brings ImageNet training to the broader public by requiring 2-4 GPUs.