Menghan Hu

CV
h-index98
22papers
559citations
Novelty34%
AI Score49

22 Papers

IVJun 11, 2023Code
TransMRSR: Transformer-based Self-Distilled Generative Prior for Brain MRI Super-Resolution

Shan Huang, Xiaohong Liu, Tao Tan et al.

Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural information combination; and (2) large-scale restoration when applied for reconstructing thick-slice MRI into high-resolution (HR) iso-tropic data. To address these problems, we propose a novel two-stage network for brain MRI SR named TransMRSR based on the convolutional blocks to extract local information and transformer blocks to capture long-range dependencies. TransMRSR consists of three modules: the shallow local feature extraction, the deep non-local feature capture, and the HR image reconstruction. We perform a generative task to encapsulate diverse priors into a generative network (GAN), which is the decoder sub-module of the deep non-local feature capture part, in the first stage. The pre-trained GAN is used for the second stage of SR task. We further eliminate the potential latent space shift caused by the two-stage training strategy through the self-distilled truncation trick. The extensive experiments show that our method achieves superior performance to other SSIR methods on both public and private datasets. Code is released at https://github.com/goddesshs/TransMRSR.git .

IVFeb 27, 2023
EDMAE: An Efficient Decoupled Masked Autoencoder for Standard View Identification in Pediatric Echocardiography

Yiman Liu, Xiaoxiang Han, Tongtong Liang et al.

This paper introduces the Efficient Decoupled Masked Autoencoder (EDMAE), a novel self-supervised method for recognizing standard views in pediatric echocardiography. EDMAE introduces a new proxy task based on the encoder-decoder structure. The EDMAE encoder is composed of a teacher and a student encoder. The teacher encoder extracts the potential representation of the masked image blocks, while the student encoder extracts the potential representation of the visible image blocks. The loss is calculated between the feature maps output by the two encoders to ensure consistency in the latent representations they extract. EDMAE uses pure convolution operations instead of the ViT structure in the MAE encoder. This improves training efficiency and convergence speed. EDMAE is pre-trained on a large-scale private dataset of pediatric echocardiography using self-supervised learning, and then fine-tuned for standard view recognition. The proposed method achieves high classification accuracy in 27 standard views of pediatric echocardiography. To further verify the effectiveness of the proposed method, the authors perform another downstream task of cardiac ultrasound segmentation on the public dataset CAMUS. The experimental results demonstrate that the proposed method outperforms some popular supervised and recent self-supervised methods, and is more competitive on different downstream tasks.

IVApr 17, 2025Code
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

Xin Li, Kun Yuan, Bingchen Li et al.

This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating reliance on model ensembles, redundant weights, and other computationally expensive components in the previous IQA/VQA competitions. Track 2 introduces a new short-form UGC dataset tailored for single image super-resolution, i.e., the KwaiSR dataset. It consists of 1,800 synthetically generated S-UGC image pairs and 1,900 real-world S-UGC images, which are split into training, validation, and test sets using a ratio of 8:1:1. The primary objective of the challenge is to drive research that benefits the user experience of short-form UGC platforms such as Kwai and TikTok. This challenge attracted 266 participants and received 18 valid final submissions with corresponding fact sheets, significantly contributing to the progress of short-form UGC VQA and image superresolution. The project is publicly available at https://github.com/lixinustc/KVQE- ChallengeCVPR-NTIRE2025.

19.1CVApr 13
Video-based Heart Rate Estimation with Angle-guided ROI Optimization and Graph Signal Denoising

Gan Pei, Junhao Ning, Boqiu Shen et al.

Remote photoplethysmography (rPPG) enables non-contact heart rate measurement from facial videos, but its performance is significantly degraded by facial motions such as speaking and head shaking. To address this issue, we propose two plug-and-play modules. The Angle-guided ROI Adaptive Optimization module quantifies ROI-Camera angles to refine motion-affected signals and capture global motion, while the Multi-region Joint Graph Signal Denoising module jointly models intra- and inter-regional ROI signals using graph signal processing to suppress motion artifacts. The modules are compatible with reflection model-based rPPG methods and validated on three public datasets. Results show that jointly use markedly reduces MAE, with an average decrease of 20.38\% over the baseline, while ablation studies confirm the effectiveness of each module. The work demonstrates the potential of angle-guided optimization and graph-based denoising to enhance rPPG performance in motion scenarios.

IVJul 16, 2025Code
CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos

Wei Sun, Linhan Cao, Kang Fu et al.

Video compression is a standard procedure applied to all videos to minimize storage and transmission demands while preserving visual quality as much as possible. Therefore, evaluating the visual quality of compressed videos is crucial for guiding the practical usage and further development of video compression algorithms. Although numerous compressed video quality assessment (VQA) methods have been proposed, they often lack the generalization capability needed to handle the increasing diversity of video types, particularly high dynamic range (HDR) content. In this paper, we introduce CompressedVQA-HDR, an effective VQA framework designed to address the challenges of HDR video quality assessment. Specifically, we adopt the Swin Transformer and SigLip 2 as the backbone networks for the proposed full-reference (FR) and no-reference (NR) VQA models, respectively. For the FR model, we compute deep structural and textural similarities between reference and distorted frames using intermediate-layer features extracted from the Swin Transformer as its quality-aware feature representation. For the NR model, we extract the global mean of the final-layer feature maps from SigLip 2 as its quality-aware representation. To mitigate the issue of limited HDR training data, we pre-train the FR model on a large-scale standard dynamic range (SDR) VQA dataset and fine-tune it on the HDRSDR-VQA dataset. For the NR model, we employ an iterative mixed-dataset training strategy across multiple compressed VQA datasets, followed by fine-tuning on the HDRSDR-VQA dataset. Experimental results show that our models achieve state-of-the-art performance compared to existing FR and NR VQA models. Moreover, CompressedVQA-HDR-FR won first place in the FR track of the Generalizable HDR & SDR Video Quality Measurement Grand Challenge at IEEE ICME 2025. The code is available at https://github.com/sunwei925/CompressedVQA-HDR.

LGFeb 2, 2024Code
Few-Shot Class-Incremental Learning with Prior Knowledge

Wenhao Jiang, Duo Li, Menghan Hu et al.

To tackle the issues of catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL), previous work has primarily concentrated on preserving the memory of old knowledge during the incremental phase. The role of pre-trained model in shaping the effectiveness of incremental learning is frequently underestimated in these studies. Therefore, to enhance the generalization ability of the pre-trained model, we propose Learning with Prior Knowledge (LwPK) by introducing nearly free prior knowledge from a few unlabeled data of subsequent incremental classes. We cluster unlabeled incremental class samples to produce pseudo-labels, then jointly train these with labeled base class samples, effectively allocating embedding space for both old and new class data. Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting, with theoretical analysis based on empirical risk minimization and class distance measurement corroborating its operational principles. The source code of LwPK is publicly available at: \url{https://github.com/StevenJ308/LwPK}.

CVJun 27, 2025Code
Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images

Liu Yang, Huiyu Duan, Jiarui Wang et al.

With the rapid advancement of Artificial Intelligence Generated Content (AIGC) techniques, AI generated images (AIGIs) have attracted widespread attention, among which AI generated omnidirectional images (AIGODIs) hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications. AI generated omnidirectional images exhibit unique quality issues, however, research on the quality assessment and optimization of AI-generated omnidirectional images is still lacking. To this end, this work first studies the quality assessment and distortion-aware saliency prediction problems for AIGODIs, and further presents a corresponding optimization process. Specifically, we first establish a comprehensive database to reflect human feedback for AI-generated omnidirectionals, termed OHF2024, which includes both subjective quality ratings evaluated from three perspectives and distortion-aware salient regions. Based on the constructed OHF2024 database, we propose two models with shared encoders based on the BLIP-2 model to evaluate the human visual experience and predict distortion-aware saliency for AI-generated omnidirectional images, which are named as BLIP2OIQA and BLIP2OISal, respectively. Finally, based on the proposed models, we present an automatic optimization process that utilizes the predicted visual experience scores and distortion regions to further enhance the visual quality of an AI-generated omnidirectional image. Extensive experiments show that our BLIP2OIQA model and BLIP2OISal model achieve state-of-the-art (SOTA) results in the human visual experience evaluation task and the distortion-aware saliency prediction task for AI generated omnidirectional images, and can be effectively used in the optimization process. The database and codes will be released on https://github.com/IntMeGroup/AIGCOIQA to facilitate future research.

CVJan 9, 2024Code
Uncertainty-aware Sampling for Long-tailed Semi-supervised Learning

Kuo Yang, Duo Li, Menghan Hu et al.

For semi-supervised learning with imbalance classes, the long-tailed distribution of data will increase the model prediction bias toward dominant classes, undermining performance on less frequent classes. Existing methods also face challenges in ensuring the selection of sufficiently reliable pseudo-labels for model training and there is a lack of mechanisms to adjust the selection of more reliable pseudo-labels based on different training stages. To mitigate this issue, we introduce uncertainty into the modeling process for pseudo-label sampling, taking into account that the model performance on the tailed classes varies over different training stages. For example, at the early stage of model training, the limited predictive accuracy of model results in a higher rate of uncertain pseudo-labels. To counter this, we propose an Uncertainty-Aware Dynamic Threshold Selection (UDTS) approach. This approach allows the model to perceive the uncertainty of pseudo-labels at different training stages, thereby adaptively adjusting the selection thresholds for different classes. Compared to other methods such as the baseline method FixMatch, UDTS achieves an increase in accuracy of at least approximately 5.26%, 1.75%, 9.96%, and 1.28% on the natural scene image datasets CIFAR10-LT, CIFAR100-LT, STL-10-LT, and the medical image dataset TissueMNIST, respectively. The source code of UDTS is publicly available at: https://github.com/yangk/UDTS.

CVMar 26, 2024
Deepfake Generation and Detection: A Benchmark and Survey

Gan Pei, Jiangning Zhang, Menghan Hu et al.

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few. With the advancements in deep learning, techniques primarily represented by Variational Autoencoders and Generative Adversarial Networks have achieved impressive generation results. More recently, the emergence of diffusion models with powerful generation capabilities has sparked a renewed wave of research. In addition to deepfake generation, corresponding detection technologies continuously evolve to regulate the potential misuse of deepfakes, such as for privacy invasion and phishing attacks. This survey comprehensively reviews the latest developments in deepfake generation and detection, summarizing and analyzing current state-of-the-arts in this rapidly evolving field. We first unify task definitions, comprehensively introduce datasets and metrics, and discuss developing technologies. Then, we discuss the development of several related sub-fields and focus on researching four representative deepfake fields: face swapping, face reenactment, talking face generation, and facial attribute editing, as well as forgery detection. Subsequently, we comprehensively benchmark representative methods on popular datasets for each field, fully evaluating the latest and influential published works. Finally, we analyze challenges and future research directions of the discussed fields.

CVApr 1, 2024
AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Liu Yang, Huiyu Duan, Long Teng et al.

In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distortions compared to natural omnidirectional images, however, there is no dedicated Image Quality Assessment (IQA) criteria for assessing them. This study addresses this gap by establishing a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024 and constructing a comprehensive benchmark. We first generate 300 omnidirectional images based on 5 AIGC models utilizing 25 text prompts. A subjective IQA experiment is conducted subsequently to assess human visual preferences from three perspectives including quality, comfortability, and correspondence. Finally, we conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database. The database will be released to facilitate future research.

IVDec 1, 2024
A Semi-Supervised Approach with Error Reflection for Echocardiography Segmentation

Xiaoxiang Han, Yiman Liu, Jiang Shang et al.

Segmenting internal structure from echocardiography is essential for the diagnosis and treatment of various heart diseases. Semi-supervised learning shows its ability in alleviating annotations scarcity. While existing semi-supervised methods have been successful in image segmentation across various medical imaging modalities, few have attempted to design methods specifically addressing the challenges posed by the poor contrast, blurred edge details and noise of echocardiography. These characteristics pose challenges to the generation of high-quality pseudo-labels in semi-supervised segmentation based on Mean Teacher. Inspired by human reflection on erroneous practices, we devise an error reflection strategy for echocardiography semi-supervised segmentation architecture. The process triggers the model to reflect on inaccuracies in unlabeled image segmentation, thereby enhancing the robustness of pseudo-label generation. Specifically, the strategy is divided into two steps. The first step is called reconstruction reflection. The network is tasked with reconstructing authentic proxy images from the semantic masks of unlabeled images and their auxiliary sketches, while maximizing the structural similarity between the original inputs and the proxies. The second step is called guidance correction. Reconstruction error maps decouple unreliable segmentation regions. Then, reliable data that are more likely to occur near high-density areas are leveraged to guide the optimization of unreliable data potentially located around decision boundaries. Additionally, we introduce an effective data augmentation strategy, termed as multi-scale mixing up strategy, to minimize the empirical distribution gap between labeled and unlabeled images and perceive diverse scales of cardiac anatomical structures. Extensive experiments demonstrate the competitiveness of the proposed method.

IVJun 28, 2025
ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge

Yixu Chen, Bowen Chen, Hai Wei et al.

This paper reports IEEE International Conference on Multimedia \& Expo (ICME) 2025 Grand Challenge on Generalizable HDR and SDR Video Quality Measurement. With the rapid development of video technology, especially High Dynamic Range (HDR) and Standard Dynamic Range (SDR) contents, the need for robust and generalizable Video Quality Assessment (VQA) methods has become increasingly demanded. Existing VQA models often struggle to deliver consistent performance across varying dynamic ranges, distortion types, and diverse content. This challenge was established to benchmark and promote VQA approaches capable of jointly handling HDR and SDR content. In the final evaluation phase, five teams submitted seven models along with technical reports to the Full Reference (FR) and No Reference (NR) tracks. Among them, four methods outperformed VMAF baseline, while the top-performing model achieved state-of-the-art performance, setting a new benchmark for generalizable video quality assessment.

CVMar 14, 2025
Active Learning from Scene Embeddings for End-to-End Autonomous Driving

Wenhao Jiang, Duo Li, Menghan Hu et al.

In the field of autonomous driving, end-to-end deep learning models show great potential by learning driving decisions directly from sensor data. However, training these models requires large amounts of labeled data, which is time-consuming and expensive. Considering that the real-world driving data exhibits a long-tailed distribution where simple scenarios constitute a majority part of the data, we are thus inspired to identify the most challenging scenarios within it. Subsequently, we can efficiently improve the performance of the model by training with the selected data of the highest value. Prior research has focused on the selection of valuable data by empirically designed strategies. However, manually designed methods suffer from being less generalizable to new data distributions. Observing that the BEV (Bird's Eye View) features in end-to-end models contain all the information required to represent the scenario, we propose an active learning framework that relies on these vectorized scene-level features, called SEAD. The framework selects initial data based on driving-environmental information and incremental data based on BEV features. Experiments show that we only need 30\% of the nuScenes training data to achieve performance close to what can be achieved with the full dataset. The source code will be released.

SDAug 7, 2021
Cough Detection Using Selected Informative Features from Audio Signals

Xinru Chen, Menghan Hu, Guangtao Zhai

Cough is a common symptom of respiratory and lung diseases. Cough detection is important to prevent, assess and control epidemic, such as COVID-19. This paper proposes a model to detect cough events from cough audio signals. The models are trained by the dataset combined ESC-50 dataset with self-recorded cough recordings. The test dataset contains inpatient cough recordings collected from inpatients of the respiratory disease department in Ruijin Hospital. We totally build 15 cough detection models based on different feature numbers selected by Random Frog, Uninformative Variable Elimination (UVE), and Variable influence on projection (VIP) algorithms respectively. The optimal model is based on 20 features selected from Mel Frequency Cepstral Coefficients (MFCC) features by UVE algorithm and classified with Support Vector Machine (SVM) linear two-class classifier. The best cough detection model realizes the accuracy, recall, precision and F1-score with 94.9%, 97.1%, 93.1% and 0.95 respectively. Its excellent performance with fewer dimensionality of the feature vector shows the potential of being applied to mobile devices, such as smartphones, thus making cough detection remote and non-contact.

MMJul 27, 2021
Angel's Girl for Blind Painters: an Efficient Painting Navigation System Validated by Multimodal Evaluation Approach

Hang Liu, Menghan Hu, Yuzhen Chen et al.

For people who ardently love painting but unfortunately have visual impairments, holding a paintbrush to create a work is a very difficult task. People in this special group are eager to pick up the paintbrush, like Leonardo da Vinci, to create and make full use of their own talents. Therefore, to maximally bridge this gap, we propose a painting navigation system to assist blind people in painting and artistic creation. The proposed system is composed of cognitive system and guidance system. The system adopts drawing board positioning based on QR code, brush navigation based on target detection and bush real-time positioning. Meanwhile, this paper uses human-computer interaction on the basis of voice and a simple but efficient position information coding rule. In addition, we design a criterion to efficiently judge whether the brush reaches the target or not. According to the experimental results, the thermal curves extracted from the faces of testers show that it is relatively well accepted by blindfolded and even blind testers. With the prompt frequency of 1s, the painting navigation system performs best with the completion degree of 89% with SD of 8.37% and overflow degree of 347% with SD of 162.14%. Meanwhile, the excellent and good types of brush tip trajectory account for 74%, and the relative movement distance is 4.21 with SD of 2.51. This work demonstrates that it is practicable for the blind people to feel the world through the brush in their hands. In the future, we plan to deploy Angle's Eyes on the phone to make it more portable. The demo video of the proposed painting navigation system is available at: https://doi.org/10.6084/m9.figshare.9760004.v1.

IVOct 20, 2020
Claw U-Net: A Unet-based Network with Deep Feature Concatenation for Scleral Blood Vessel Segmentation

Chang Yao, Jingyu Tang, Menghan Hu et al.

Sturge-Weber syndrome (SWS) is a vascular malformation disease, and it may cause blindness if the patient's condition is severe. Clinical results show that SWS can be divided into two types based on the characteristics of scleral blood vessels. Therefore, how to accurately segment scleral blood vessels has become a significant problem in computer-aided diagnosis. In this research, we propose to continuously upsample the bottom layer's feature maps to preserve image details, and design a novel Claw UNet based on UNet for scleral blood vessel segmentation. Specifically, the residual structure is used to increase the number of network layers in the feature extraction stage to learn deeper features. In the decoding stage, by fusing the features of the encoding, upsampling, and decoding parts, Claw UNet can achieve effective segmentation in the fine-grained regions of scleral blood vessels. To effectively extract small blood vessels, we use the attention mechanism to calculate the attention coefficient of each position in images. Claw UNet outperforms other UNet-based networks on scleral blood vessel image dataset.

CVOct 20, 2020
Identification of deep breath while moving forward based on multiple body regions and graph signal analysis

Yunlu Wang, Cheng Yang, Menghan Hu et al.

This paper presents an unobtrusive solution that can automatically identify deep breath when a person is walking past the global depth camera. Existing non-contact breath assessments achieve satisfactory results under restricted conditions when human body stays relatively still. When someone moves forward, the breath signals detected by depth camera are hidden within signals of trunk displacement and deformation, and the signal length is short due to the short stay time, posing great challenges for us to establish models. To overcome these challenges, multiple region of interests (ROIs) based signal extraction and selection method is proposed to automatically obtain the signal informative to breath from depth video. Subsequently, graph signal analysis (GSA) is adopted as a spatial-temporal filter to wipe the components unrelated to breath. Finally, a classifier for identifying deep breath is established based on the selected breath-informative signal. In validation experiments, the proposed approach outperforms the comparative methods with the accuracy, precision, recall and F1 of 75.5%, 76.2%, 75.0% and 75.2%, respectively. This system can be extended to public places to provide timely and ubiquitous help for those who may have or are going through physical or mental trouble.

CVOct 9, 2020
Face Mask Assistant: Detection of Face Mask Service Stage Based on Mobile Phone

Yuzhen Chen, Menghan Hu, Chunjun Hua et al.

Coronavirus Disease 2019 (COVID-19) has spread all over the world since it broke out massively in December 2019, which has caused a large loss to the whole world. Both the confirmed cases and death cases have reached a relatively frightening number. Syndrome coronaviruses 2 (SARS-CoV-2), the cause of COVID-19, can be transmitted by small respiratory droplets. To curb its spread at the source, wearing masks is a convenient and effective measure. In most cases, people use face masks in a high-frequent but short-time way. Aimed at solving the problem that we don't know which service stage of the mask belongs to, we propose a detection system based on the mobile phone. We first extract four features from the GLCMs of the face mask's micro-photos. Next, a three-result detection system is accomplished by using KNN algorithm. The results of validation experiments show that our system can reach a precision of 82.87% (standard deviation=8.5%) on the testing dataset. In future work, we plan to expand the detection objects to more mask types. This work demonstrates that the proposed mobile microscope system can be used as an assistant for face mask being used, which may play a positive role in fighting against COVID-19.

CVApr 15, 2020
Combining Visible Light and Infrared Imaging for Efficient Detection of Respiratory Infections such as COVID-19 on Portable Device

Zheng Jiang, Menghan Hu, Lei Fan et al.

Coronavirus Disease 2019 (COVID-19) has become a serious global epidemic in the past few months and caused huge loss to human society worldwide. For such a large-scale epidemic, early detection and isolation of potential virus carriers is essential to curb the spread of the epidemic. Recent studies have shown that one important feature of COVID-19 is the abnormal respiratory status caused by viral infections. During the epidemic, many people tend to wear masks to reduce the risk of getting sick. Therefore, in this paper, we propose a portable non-contact method to screen the health condition of people wearing masks through analysis of the respiratory characteristics. The device mainly consists of a FLIR one thermal camera and an Android phone. This may help identify those potential patients of COVID-19 under practical scenarios such as pre-inspection in schools and hospitals. In this work, we perform the health screening through the combination of the RGB and thermal videos obtained from the dual-mode camera and deep learning architecture.We first accomplish a respiratory data capture technique for people wearing masks by using face recognition. Then, a bidirectional GRU neural network with attention mechanism is applied to the respiratory data to obtain the health screening result. The results of validation experiments show that our model can identify the health status on respiratory with the accuracy of 83.7\% on the real-world dataset. The abnormal respiratory data and part of normal respiratory data are collected from Ruijin Hospital Affiliated to The Shanghai Jiao Tong University Medical School. Other normal respiratory data are obtained from healthy people around our researchers. This work demonstrates that the proposed portable and intelligent health screening device can be used as a pre-scan method for respiratory infections, which may help fight the current COVID-19 epidemic.

LGFeb 12, 2020
Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner

Yunlu Wang, Menghan Hu, Qingli Li et al.

Research significance: The extended version of this paper has been accepted by IEEE Internet of Things journal (DOI: 10.1109/JIOT.2020.2991456), please cite the journal version. During the epidemic prevention and control period, our study can be helpful in prognosis, diagnosis and screening for the patients infected with COVID-19 (the novel coronavirus) based on breathing characteristics. According to the latest clinical research, the respiratory pattern of COVID-19 is different from the respiratory patterns of flu and the common cold. One significant symptom that occurs in the COVID-19 is Tachypnea. People infected with COVID-19 have more rapid respiration. Our study can be utilized to distinguish various respiratory patterns and our device can be preliminarily put to practical use. Demo videos of this method working in situations of one subject and two subjects can be downloaded online. Research details: Accurate detection of the unexpected abnormal respiratory pattern of people in a remote and unobtrusive manner has great significance. In this work, we innovatively capitalize on depth camera and deep learning to achieve this goal. The challenges in this task are twofold: the amount of real-world data is not enough for training to get the deep model; and the intra-class variation of different types of respiratory patterns is large and the outer-class variation is small. In this paper, considering the characteristics of actual respiratory signals, a novel and efficient Respiratory Simulation Model (RSM) is first proposed to fill the gap between the large amount of training data and scarce real-world data. The proposed deep model and the modeling ideas have the great potential to be extended to large scale applications such as public places, sleep scenario, and office environment.

IVDec 3, 2019
Robust Invisible Hyperlinks in Physical Photographs Based on 3D Rendering Attacks

Jun Jia, Zhongpai Gao, Kang Chen et al.

In the era of multimedia and Internet, people are eager to obtain information from offline to online. Quick Response (QR) codes and digital watermarks help us access information quickly. However, QR codes look ugly and invisible watermarks can be easily broken in physical photographs. Therefore, this paper proposes a novel method to embed hyperlinks into natural images, making the hyperlinks invisible for human eyes but detectable for mobile devices. Our method is an end-to-end neural network with an encoder to hide information and a decoder to recover information. From original images to physical photographs, camera imaging process will introduce a series of distortion such as noise, blur, and light. To train a robust decoder against the physical distortion from the real world, a distortion network based on 3D rendering is inserted between the encoder and the decoder to simulate the camera imaging process. Besides, in order to maintain the visual attraction of the image with hyperlinks, we propose a loss function based on just noticeable difference (JND) to supervise the training of encoder. Experimental results show that our approach outperforms the previous method in both simulated and real situations.

CVJul 12, 2017
Terahertz Security Image Quality Assessment by No-reference Model Observers

Menghan Hu, Xiongkuo Min, Guangtao Zhai et al.

To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scores. Subsequently, the existing no-reference IQA algorithms, which were 5 opinion-aware approaches viz., NFERM, GMLF, DIIVINE, BRISQUE and BLIINDS2, and 8 opinion-unaware approaches viz., QAC, SISBLIM, NIQE, FISBLIM, CPBD, S3 and Fish_bb, were executed for the evaluation of the THz security image quality. The statistical results demonstrated the superiority of Fish_bb over the other testing IQA approaches for assessing the THz image quality with PLCC (SROCC) values of 0.8925 (-0.8706), and with RMSE value of 0.3993. The linear regression analysis and Bland-Altman plot further verified that the Fish__bb could substitute for the subjective IQA. Nonetheless, for the classification of THz security images, we tended to use S3 as a criterion for ranking THz security image grades because of the relatively low false positive rate in classifying bad THz image quality into acceptable category (24.69%). Interestingly, due to the specific property of THz image, the average pixel intensity gave the best performance than the above complicated IQA algorithms, with the PLCC, SROCC and RMSE of 0.9001, -0.8800 and 0.3857, respectively. This study will help the users such as researchers or security staffs to obtain the THz security images of good quality. Currently, our research group is attempting to make this research more comprehensive.