CVJul 12, 2023Code
Rectifying Noisy Labels with Sequential Prior: Multi-Scale Temporal Feature Affinity Learning for Robust Video SegmentationBeilei Cui, Minqing Zhang, Mengya Xu et al.
Noisy label problems are inevitably in existence within medical image segmentation causing severe performance degradation. Previous segmentation methods for noisy label problems only utilize a single image while the potential of leveraging the correlation between images has been overlooked. Especially for video segmentation, adjacent frames contain rich contextual information beneficial in cognizing noisy labels. Based on two insights, we propose a Multi-Scale Temporal Feature Affinity Learning (MS-TFAL) framework to resolve noisy-labeled medical video segmentation issues. First, we argue the sequential prior of videos is an effective reference, i.e., pixel-level features from adjacent frames are close in distance for the same class and far in distance otherwise. Therefore, Temporal Feature Affinity Learning (TFAL) is devised to indicate possible noisy labels by evaluating the affinity between pixels in two adjacent frames. We also notice that the noise distribution exhibits considerable variations across video, image, and pixel levels. In this way, we introduce Multi-Scale Supervision (MSS) to supervise the network from three different perspectives by re-weighting and refining the samples. This design enables the network to concentrate on clean samples in a coarse-to-fine manner. Experiments with both synthetic and real-world label noise demonstrate that our method outperforms recent state-of-the-art robust segmentation approaches. Code is available at https://github.com/BeileiCui/MS-TFAL.
IVOct 8, 2023
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial IntelligenceJianing Qiu, Jian Wu, Hao Wei et al.
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
CRAug 12, 2023
A One-dimensional HEVC video steganalysis method using the Optimality of Predicted Motion VectorsJun Li, Minqing Zhang, Ke Niu et al.
Among steganalysis techniques, detection against motion vector (MV) domain-based video steganography in High Efficiency Video Coding (HEVC) standard remains a hot and challenging issue. For the purpose of improving the detection performance, this paper proposes a steganalysis feature based on the optimality of predicted MVs with a dimension of one. Firstly, we point out that the motion vector prediction (MVP) of the prediction unit (PU) encoded using the Advanced Motion Vector Prediction (AMVP) technique satisfies the local optimality in the cover video. Secondly, we analyze that in HEVC video, message embedding either using MVP index or motion vector differences (MVD) may destroy the above optimality of MVP. And then, we define the optimal rate of MVP in HEVC video as a steganalysis feature. Finally, we conduct steganalysis detection experiments on two general datasets for three popular steganography methods and compare the performance with four state-of-the-art steganalysis methods. The experimental results show that the proposed optimal rate of MVP for all cover videos is 100\%, while the optimal rate of MVP for all stego videos is less than 100\%. Therefore, the proposed steganography scheme can accurately distinguish between cover videos and stego videos, and it is efficiently applied to practical scenarios with no model training and low computational complexity.
CVSep 27, 2023
CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy GradingHao Wei, Peilun Shi, Juzheng Miao et al.
Diabetic retinopathy (DR) is the most common diabetic complication, which usually leads to retinal damage, vision loss, and even blindness. A computer-aided DR grading system has a significant impact on helping ophthalmologists with rapid screening and diagnosis. Recent advances in fundus photography have precipitated the development of novel retinal imaging cameras and their subsequent implementation in clinical practice. However, most deep learning-based algorithms for DR grading demonstrate limited generalization across domains. This inferior performance stems from variance in imaging protocols and devices inducing domain shifts. We posit that declining model performance between domains arises from learning spurious correlations in the data. Incorporating do-operations from causality analysis into model architectures may mitigate this issue and improve generalizability. Specifically, a novel universal structural causal model (SCM) was proposed to analyze spurious correlations in fundus imaging. Building on this, a causality-inspired diabetic retinopathy grading framework named CauDR was developed to eliminate spurious correlations and achieve more generalizable DR diagnostics. Furthermore, existing datasets were reorganized into 4DR benchmark for DG scenario. Results demonstrate the effectiveness and the state-of-the-art (SOTA) performance of CauDR.
CVMar 16, 2024Code
VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image AnalysisHao Wei, Bowen Liu, Minqing Zhang et al.
Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned challenge. Here we harness 1 million open-source synthetic fundus images paired with natural language descriptions, to curate an ethical language-image foundation model for retina image analysis named VisionCLIP. VisionCLIP achieves competitive performance on three external datasets compared with the existing method pre-trained on real-world data in a zero-shot fashion. The employment of artificially synthetic images alongside corresponding textual data for training enables the medical foundation model to successfully assimilate knowledge of disease symptomatology, thereby circumventing potential breaches of patient confidentiality.
IVNov 17, 2023
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTAHao Wei, Peilun Shi, Guitao Bai et al.
Ultra-wide optical coherence tomography angiography (UW-OCTA) is an emerging imaging technique that offers significant advantages over traditional OCTA by providing an exceptionally wide scanning range of up to 24 x 20 $mm^{2}$, covering both the anterior and posterior regions of the retina. However, the currently accessible UW-OCTA datasets suffer from limited comprehensive hierarchical information and corresponding disease annotations. To address this limitation, we have curated the pioneering M3OCTA dataset, which is the first multimodal (i.e., multilayer), multi-disease, and widest field-of-view UW-OCTA dataset. Furthermore, the effective utilization of multi-layer ultra-wide ocular vasculature information from UW-OCTA remains underdeveloped. To tackle this challenge, we propose the first cross-modal fusion framework that leverages multi-modal information for diagnosing multiple diseases. Through extensive experiments conducted on our openly available M3OCTA dataset, we demonstrate the effectiveness and superior performance of our method, both in fixed and varying modalities settings. The construction of the M3OCTA dataset, the first multimodal OCTA dataset encompassing multiple diseases, aims to advance research in the ophthalmic image analysis community.
CRSep 25, 2020
A Reversible Data hiding Scheme in Encrypted Domain for Secret Image Sharing based on Chinese Remainder TheoremYan Ke, Minqing Zhang, Xinpeng Zhang et al.
Reversible data hiding in encrypted domain (RDH-ED) schemes based on symmetric or public key encryption are mainly applied to the security of end-to-end communication. Aimed at providing reliable technical supports for multi-party security scenarios, a separable RDH-ED scheme for secret image sharing based on Chinese remainder theorem (CRT) is presented. In the application of (t, n) secret image sharing, the image is first shared into n different shares of ciphertext. Only when not less than t shares obtained, can the original image be reconstructed. In our scheme, additional data could be embedded into the image shares. To realize data extraction from the image shares and the reconstructed image separably, two data hiding methods are proposed: one is homomorphic difference expansion in encrypted domain (HDE-ED) that supports data extraction from the reconstructed image by utilizing the addition homomorphism of CRT secret sharing; the other is difference expansion in image shares (DE-IS) that supports the data extraction from the marked shares before image reconstruction. Experimental results demonstrate that the proposed scheme could not only maintain the security and the threshold function of secret sharing system, but also obtain a better reversibility and efficiency compared with most existing RDH-ED algorithms. The maximum embedding rate of HDE-ED could reach 0.5000 bits per pixel and the average embedding rate of DE-IS is 0.0545 bits per bit of ciphertext.
CRJun 18, 2019
Recent Advances of Image Steganography with Generative Adversarial NetworksJia Liu, Yan Ke, Yu Lei et al.
In the past few years, the Generative Adversarial Network (GAN) which proposed in 2014 has achieved great success. GAN has achieved many research results in the field of computer vision and natural language processing. Image steganography is dedicated to hiding secret messages in digital images, and has achieved the purpose of covert communication. Recently, research on image steganography has demonstrated great potential for using GAN and neural networks. In this paper we review different strategies for steganography such as cover modification, cover selection and cover synthesis by GANs, and discuss the characteristics of these methods as well as evaluation metrics and provide some possible future research directions in image steganography.
MMApr 26, 2018
Generative Steganography by SamplingZhuo Zhang, Jia Liu, Yan Ke et al.
In this paper, a novel data-driven information hiding scheme called generative steganography by sampling (GSS) is proposed. Unlike in traditional modification-based steganography, in our method the stego image is directly sampled by a powerful generator: no explicit cover is used. Both parties share a secret key used for message embedding and extraction. The Jensen-Shannon divergence is introduced as a new criterion for evaluating the security of generative steganography. Based on these principles, we propose a simple practical generative steganography method that uses semantic image inpainting. The message is written in advance to an uncorrupted region that needs to be retained in the corrupted image. Then, the corrupted image with the secret message is fed into a Generator trained by a generative adversarial network (GAN) for semantic completion. Message loss and prior loss terms are proposed for penalizing message extraction error and unrealistic stego image. In our design, we first train a generator whose training target is the generation of new data samples from the same distribution as that of existing training data. Next, for the trained generator, backpropagation to the message and prior loss are introduced to optimize the coding of the input noise data for the generator. The presented experiments demonstrate the potential of the proposed framework based on both qualitative and quantitative evaluations of the generated stego images.
CRApr 18, 2018
The Reincarnation of Grille Cipher: A Generative ApproachJia Liu, Yan Ke, Yu Lei et al.
In order to keep the data secret, various techniques have been implemented to encrypt and decrypt the secret data. Cryptography is committed to the security of content, i.e. it cannot be restored with a given ciphertext. Steganography is to hiding the existence of a communication channel within a stego. However, it has been difficult to construct a cipher (cypher) that simultaneously satisfy both channel and content security for secure communication. Inspired by the Cardan grille, this paper presents a new generative framework for grille cipher. A digital cardan grille is used for message encryption and decryption. The ciphertext is directly sampled by a powerful generator without an explicit cover. Message loss and prior loss are proposed for penalizing message extraction error and unrealistic ciphertext. Jensen-Shannon Divergence is introduced as new criteria for channel security. A simple practical data-driven grille cipher is proposed using semantic image inpainting and generative adversarial network. Experimental results demonstrate the promising of the proposed method.
MMMar 25, 2018
Digital Cardan Grille: A Modern Approach for Information HidingJia Liu, Tanping Zhou, Zhuo Zhang et al.
In this paper, a new framework for construction of Cardan grille for information hiding is proposed. Based on the semantic image inpainting technique, the stego image are driven by secret messages directly. A mask called Digital Cardan Grille (DCG) for determining the hidden location is introduced to hide the message. The message is written to the corrupted region that needs to be filled in the corrupted image in advance. Then the corrupted image with secret message is feeded into a Generative Adversarial Network (GAN) for semantic completion. The adversarial game not only reconstruct the corrupted image , but also generate a stego image which contains the logic rationality of image content. The experimental results verify the feasibility of the proposed method.
MMNov 14, 2017
Generative Steganography with Kerckhoffs' PrincipleYan Ke, Minqing Zhang, Jia Liu et al.
The distortion in steganography that usually comes from the modification or recoding on the cover image during the embedding process leaves the steganalyzer with possibility of discriminating. Faced with such a risk, we propose generative steganography with Kerckhoffs' principle (GSK) in this letter. In GSK, the secret messages are generated by a cover image using a generator rather than embedded into the cover, thus resulting in no modifications in the cover. To ensure the security, the generators are trained to meet Kerckhoffs' principle based on generative adversarial networks (GAN). Everything about the GSK system, except the extraction key, is public knowledge for the receivers. The secret messages can be outputted by the generator if and only if the extraction key and the cover image are both inputted. In the generator training procedures, there are two GANs, Message- GAN and Cover-GAN, designed to work jointly making the generated results under the control of the extraction key and the cover image. We provide experimental results on the training process and give an example of the working process by adopting a generator trained on MNIST, which demonstrate that GSK can use a cover image without any modification to generate messages, and without the extraction key or the cover image, only meaningless results would be obtained.