CVMar 19
CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language ModelsXiang Chen, Fangfang Yang, Chunlei Meng et al.
Medical vision--language models (MVLMs) are increasingly used as perceptual backbones in radiology pipelines and as the visual front end of multimodal assistants, yet their reliability under real clinical workflows remains underexplored. Prior robustness evaluations often assume clean, curated inputs or study isolated corruptions, overlooking routine acquisition, reconstruction, display, and delivery operations that preserve clinical readability while shifting image statistics. To address this gap, we propose CoDA, a chain-of-distribution framework that constructs clinically plausible pipeline shifts by composing acquisition-like shading, reconstruction and display remapping, and delivery and export degradations. Under masked structural-similarity constraints, CoDA jointly optimizes stage compositions and parameters to induce failures while preserving visual plausibility. Across brain MRI, chest X-ray, and abdominal CT, CoDA substantially degrades the zero-shot performance of CLIP-style MVLMs, with chained compositions consistently more damaging than any single stage. We also evaluate multimodal large language models (MLLMs) as technical-authenticity auditors of imaging realism and quality rather than pathology. Proprietary multimodal models show degraded auditing reliability and persistent high-confidence errors on CoDA-shifted samples, while the medical-specific MLLMs we test exhibit clear deficiencies in medical image quality auditing. Finally, we introduce a post-hoc repair strategy based on teacher-guided token-space adaptation with patch-level alignment, which improves accuracy on archived CoDA outputs. Overall, our findings characterize a clinically grounded threat surface for MVLM deployment and show that lightweight alignment improves robustness in deployment.
CVAug 5, 2021
Automatic Rail Component Detection Based on AttnConv-NetTiange Wang, Zijun Zhang, Fangfang Yang et al.
The automatic detection of major rail components using railway images is beneficial to ensure the rail transport safety. In this paper, we propose an attention-powered deep convolutional network (AttnConv-net) to detect multiple rail components including the rail, clips, and bolts. The proposed method consists of a deep convolutional neural network (DCNN) as the backbone, cascading attention blocks (CAB), and two feed forward networks (FFN). Two types of positional embedding are applied to enrich information in latent features extracted from the backbone. Based on processed latent features, the CAB aims to learn the local context of rail components including their categories and component boundaries. Final categories and bounding boxes are generated via two FFN implemented in parallel. To enhance the detection of small components, various data augmentation methods are employed in the training process. The effectiveness of the proposed AttnConv-net is validated with one real dataset and another synthesized dataset. Compared with classic convolutional neural network based methods, our proposed method simplifies the detection pipeline by eliminating the need of prior- and post-processing, which offers a new speed-quality solution to enable faster and more accurate image-based rail component detections
CVAug 5, 2021
Intelligent Railway Foreign Object Detection: A Semi-supervised Convolutional Autoencoder Based MethodTiange Wang, Zijun Zhang, Fangfang Yang et al.
Automated inspection and detection of foreign objects on railways is important for rail transportation safety as it helps prevent potential accidents and trains derailment. Most existing vision-based approaches focus on the detection of frontal intrusion objects with prior labels, such as categories and locations of the objects. In reality, foreign objects with unknown categories can appear anytime on railway tracks. In this paper, we develop a semi-supervised convolutional autoencoder based framework that only requires railway track images without prior knowledge on the foreign objects in the training process. It consists of three different modules, a bottleneck feature generator as encoder, a photographic image generator as decoder, and a reconstruction discriminator developed via adversarial learning. In the proposed framework, the problem of detecting the presence, location, and shape of foreign objects is addressed by comparing the input and reconstructed images as well as setting thresholds based on reconstruction errors. The proposed method is evaluated through comprehensive studies under different performance criteria. The results show that the proposed method outperforms some well-known benchmarking methods. The proposed framework is useful for data analytics via the train Internet-of-Things (IoT) systems
LGJun 10, 2020
Adversarial Attacks on Brain-Inspired Hyperdimensional Computing-Based ClassifiersFangfang Yang, Shaolei Ren
Being an emerging class of in-memory computing architecture, brain-inspired hyperdimensional computing (HDC) mimics brain cognition and leverages random hypervectors (i.e., vectors with a dimensionality of thousands or even more) to represent features and to perform classification tasks. The unique hypervector representation enables HDC classifiers to exhibit high energy efficiency, low inference latency and strong robustness against hardware-induced bit errors. Consequently, they have been increasingly recognized as an appealing alternative to or even replacement of traditional deep neural networks (DNNs) for local on device classification, especially on low-power Internet of Things devices. Nonetheless, unlike their DNN counterparts, state-of-the-art designs for HDC classifiers are mostly security-oblivious, casting doubt on their safety and immunity to adversarial inputs. In this paper, we study for the first time adversarial attacks on HDC classifiers and highlight that HDC classifiers can be vulnerable to even minimally-perturbed adversarial samples. Concretely, using handwritten digit classification as an example, we construct a HDC classifier and formulate a grey-box attack problem, where an attacker's goal is to mislead the target HDC classifier to produce erroneous prediction labels while keeping the amount of added perturbation noise as little as possible. Then, we propose a modified genetic algorithm to generate adversarial samples within a reasonably small number of queries. Our results show that adversarial images generated by our algorithm can successfully mislead the HDC classifier to produce wrong prediction labels with a high probability (i.e., 78% when the HDC classifier uses a fixed majority rule for decision). Finally, we also present two defense strategies -- adversarial training and retraining-- to strengthen the security of HDC classifiers.