CVNov 29, 2022
Unsupervised Visual Defect Detection with Score-Based Generative ModelYapeng Teng, Haoyang Li, Fuzhen Cai et al.
Anomaly Detection (AD), as a critical problem, has been widely discussed. In this paper, we specialize in one specific problem, Visual Defect Detection (VDD), in many industrial applications. And in practice, defect image samples are very rare and difficult to collect. Thus, we focus on the unsupervised visual defect detection and localization tasks and propose a novel framework based on the recent score-based generative models, which synthesize the real image by iterative denoising through stochastic differential equations (SDEs). Our work is inspired by the fact that with noise injected into the original image, the defects may be changed into normal cases in the denoising process (i.e., reconstruction). First, based on the assumption that the anomalous data lie in the low probability density region of the normal data distribution, we explain a common phenomenon that occurs when reconstruction-based approaches are applied to VDD: normal pixels also change during the reconstruction process. Second, due to the differences in normal pixels between the reconstructed and original images, a time-dependent gradient value (i.e., score) of normal data distribution is utilized as a metric, rather than reconstruction loss, to gauge the defects. Third, a novel $T$ scales approach is developed to dramatically reduce the required number of iterations, accelerating the inference process. These practices allow our model to generalize VDD in an unsupervised manner while maintaining reasonably good performance. We evaluate our method on several datasets to demonstrate its effectiveness.
LGFeb 23, 2023
A Deep Neural Network Based Approach to Building Budget-Constrained Models for Big Data AnalysisRui Ming, Haiping Xu, Shannon E. Gibbs et al.
Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and developing a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for big data analysis using Deep Neural Networks (DNNs). Once a DNN model has been developed, we identify the weak links and weak neurons, and remove some input features to bring the model cost within a given budget. The experimental results show our approach is feasible and supports user selection of a suitable BCM within a given budget.
CVOct 8, 2018Code
Arc-support Line Segments Revisited: An Efficient and High-quality Ellipse DetectionChangsheng Lu, Siyu Xia, Ming Shao et al.
Over the years many ellipse detection algorithms spring up and are studied broadly, while the critical issue of detecting ellipses accurately and efficiently in real-world images remains a challenge. In this paper, we propose a valuable industry-oriented ellipse detector by arc-support line segments, which simultaneously reaches high detection accuracy and efficiency. To simplify the complicated curves in an image while retaining the general properties including convexity and polarity, the arc-support line segments are extracted, which grounds the successful detection of ellipses. The arc-support groups are formed by iteratively and robustly linking the arc-support line segments that latently belong to a common ellipse. Afterward, two complementary approaches, namely, locally selecting the arc-support group with higher saliency and globally searching all the valid paired groups, are adopted to fit the initial ellipses in a fast way. Then, the ellipse candidate set can be formulated by hierarchical clustering of 5D parameter space of initial ellipses. Finally, the salient ellipse candidates are selected and refined as detections subject to the stringent and effective verification. Extensive experiments on three public datasets are implemented and our method achieves the best F-measure scores compared to the state-of-the-art methods. The source code is available at https://github.com/AlanLuSun/High-quality-ellipse-detection.
CVApr 2, 2024
Sketch3D: Style-Consistent Guidance for Sketch-to-3D GenerationWangguandong Zheng, Haifeng Xia, Rui Chen et al.
Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.
LGJul 15, 2025
Toward Improving fNIRS Classification: A Study on Activation Functions in Deep Neural ArchitecturesBehtom Adeli, John McLinden, Pankaj Pandey et al.
Activation functions are critical to the performance of deep neural networks, particularly in domains such as functional near-infrared spectroscopy (fNIRS), where nonlinearity, low signal-to-noise ratio (SNR), and signal variability poses significant challenges to model accuracy. However, the impact of activation functions on deep learning (DL) performance in the fNIRS domain remains underexplored and lacks systematic investigation in the current literature. This study evaluates a range of conventional and field-specific activation functions for fNIRS classification tasks using multiple deep learning architectures, including the domain-specific fNIRSNet, AbsoluteNet, MDNN, and shallowConvNet (as the baseline), all tested on a single dataset recorded during an auditory task. To ensure fair a comparison, all networks were trained and tested using standardized preprocessing and consistent training parameters. The results show that symmetrical activation functions such as Tanh and the Absolute value function Abs(x) can outperform commonly used functions like the Rectified Linear Unit (ReLU), depending on the architecture. Additionally, a focused analysis of the role of symmetry was conducted using a Modified Absolute Function (MAF), with results further supporting the effectiveness of symmetrical activation functions on performance gains. These findings underscore the importance of selecting proper activation functions that align with the signal characteristics of fNIRS data.
LGMay 27, 2025
AbsoluteNet: A Deep Learning Neural Network to Classify Cerebral Hemodynamic Responses of Auditory ProcessingBehtom Adeli, John Mclinden, Pankaj Pandey et al.
In recent years, deep learning (DL) approaches have demonstrated promising results in decoding hemodynamic responses captured by functional near-infrared spectroscopy (fNIRS), particularly in the context of brain-computer interface (BCI) applications. This work introduces AbsoluteNet, a novel deep learning architecture designed to classify auditory event-related responses recorded using fNIRS. The proposed network is built upon principles of spatio-temporal convolution and customized activation functions. Our model was compared against several models, namely fNIRSNET, MDNN, DeepConvNet, and ShallowConvNet. The results showed that AbsoluteNet outperforms existing models, reaching 87.0% accuracy, 84.8% sensitivity, and 89.2% specificity in binary classification, surpassing fNIRSNET, the second-best model, by 3.8% in accuracy. These findings underscore the effectiveness of our proposed deep learning model in decoding hemodynamic responses related to auditory processing and highlight the importance of spatio-temporal feature aggregation and customized activation functions to better fit fNIRS dynamics.
CVApr 30, 2024
Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action RecognitionZhendong Liu, Haifeng Xia, Tong Guo et al.
Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others.
CVNov 17, 2025
Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor AttacksMd. Iqbal Hossain, Afia Sajeeda, Neeresh Kumar Perla et al.
The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We develop two algorithms to rectify the poisoned model: (1) differentiating between CLIP and Oracle's knowledge to identify potential triggers; (2) pinpointing affected labels and victim samples, and curating a compact fine-tuning dataset. With this knowledge, we are allowed to rectify the poisoned CLIP model to negate backdoor effects. Extensive experiments on visual recognition benchmarks demonstrate our strategy is effective in CLIP-based backdoor defense.
CVDec 3, 2023
Few-shot Shape Recognition by Learning Deep Shape-aware FeaturesWenlong Shi, Changsheng Lu, Ming Shao et al.
Traditional shape descriptors have been gradually replaced by convolutional neural networks due to their superior performance in feature extraction and classification. The state-of-the-art methods recognize object shapes via image reconstruction or pixel classification. However , these methods are biased toward texture information and overlook the essential shape descriptions, thus, they fail to generalize to unseen shapes. We are the first to propose a fewshot shape descriptor (FSSD) to recognize object shapes given only one or a few samples. We employ an embedding module for FSSD to extract transformation-invariant shape features. Secondly, we develop a dual attention mechanism to decompose and reconstruct the shape features via learnable shape primitives. In this way, any shape can be formed through a finite set basis, and the learned representation model is highly interpretable and extendable to unseen shapes. Thirdly, we propose a decoding module to include the supervision of shape masks and edges and align the original and reconstructed shape features, enforcing the learned features to be more shape-aware. Lastly, all the proposed modules are assembled into a few-shot shape recognition scheme. Experiments on five datasets show that our FSSD significantly improves the shape classification compared to the state-of-the-art under the few-shot setting.
CVOct 31, 2021
The 5th Recognizing Families in the Wild Data Challenge: Predicting Kinship from FacesJoseph P. Robinson, Can Qin, Ming Shao et al.
Recognizing Families In the Wild (RFIW), held as a data challenge in conjunction with the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG), is a large-scale, multi-track visual kinship recognition evaluation. For the fifth edition of RFIW, we continue to attract scholars, bring together professionals, publish new work, and discuss prospects. In this paper, we summarize submissions for the three tasks of this year's RFIW: specifically, we review the results for kinship verification, tri-subject verification, and family member search and retrieval. We look at the RFIW problem, share current efforts, and make recommendations for promising future directions.
CVAug 26, 2020
Discriminative Cross-Domain Feature Learning for Partial Domain AdaptationTaotao Jing, Ming Shao, Zhengming Ding
Partial domain adaptation aims to adapt knowledge from a larger and more diverse source domain to a smaller target domain with less number of classes, which has attracted appealing attention. Recent practice on domain adaptation manages to extract effective features by incorporating the pseudo labels for the target domain to better fight off the cross-domain distribution divergences. However, it is essential to align target data with only a small set of source data. In this paper, we develop a novel Discriminative Cross-Domain Feature Learning (DCDF) framework to iteratively optimize target labels with a cross-domain graph in a weighted scheme. Specifically, a weighted cross-domain center loss and weighted cross-domain graph propagation are proposed to couple unlabeled target data to related source samples for discriminative cross-domain feature learning, where irrelevant source centers will be ignored, to alleviate the marginal and conditional disparities simultaneously. Experimental evaluations on several popular benchmarks demonstrate the effectiveness of our proposed approach on facilitating the recognition for the unlabeled target domain, through comparing it to the state-of-the-art partial domain adaptation approaches.
CVJul 28, 2020
Families In Wild Multimedia: A Multimodal Database for Recognizing KinshipJoseph P. Robinson, Zaid Khan, Yu Yin et al.
Kinship, a soft biometric detectable in media, is fundamental for a myriad of use-cases. Despite the difficulty of detecting kinship, annual data challenges using still-images have consistently improved performances and attracted new researchers. Now, systems reach performance levels unforeseeable a decade ago, closing in on performances acceptable to deploy in practice. Like other biometric tasks, we expect systems can receive help from other modalities. We hypothesize that adding modalities to FIW, which has only still-images, will improve performance. Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i.e., video, audio, and text captions). Specifically, we introduce the first publicly available multi-task MM kinship dataset. To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data, requiring minimal human input and no financial cost. The proposed MM corpus allows the problem statements to be more realistic template-based protocols. We show significant improvements in all benchmarks with the added modalities. The results highlight edge cases to inspire future research with different areas of improvement. FIW MM supplies the data needed to increase the potential of automated systems to detect kinship in MM. It also allows experts from diverse fields to collaborate in novel ways.
CVJun 29, 2020
Survey on the Analysis and Modeling of Visual Kinship: A Decade in the MakingJoseph P Robinson, Ming Shao, Yun Fu
Kinship recognition is a challenging problem with many practical applications. With much progress and milestones having been reached after ten years - we are now able to survey the research and create new milestones. We review the public resources and data challenges that enabled and inspired many to hone-in on the views of automatic kinship recognition in the visual domain. The different tasks are described in technical terms and syntax consistent across the problem domain and the practical value of each discussed and measured. State-of-the-art methods for visual kinship recognition problems, whether to discriminate between or generate from, are examined. As part of such, we review systems proposed as part of a recent data challenge held in conjunction with the 2020 IEEE Conference on Automatic Face and Gesture Recognition. We establish a stronghold for the state of progress for the different problems in a consistent manner. This survey will serve as the central resource for the work of the next decade to build upon. For the tenth anniversary, the demo code is provided for the various kin-based tasks. Detecting relatives with visual recognition and classifying the relationship is an area with high potential for impact in research and practice.IEEE Transactions on pattern analysis and machine intelligence
CVFeb 15, 2020
Recognizing Families In the Wild: White Paper for the 4th Edition Data ChallengeJoseph P. Robinson, Yu Yin, Zaid Khan et al.
Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before. Organized in conjunction with the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG) as a Challenge, RFIW provides a platform for publishing original work and the gathering of experts for a discussion of the next steps. This paper summarizes the supported tasks (i.e., kinship verification, tri-subject verification, and search & retrieval of missing children) in the evaluation protocols, which include the practical motivation, technical background, data splits, metrics, and benchmark results. Furthermore, top submissions (i.e., leader-board stats) are listed and reviewed as a high-level analysis on the state of the problem. In the end, the purpose of this paper is to describe the 2020 RFIW challenge, end-to-end, along with forecasts in promising future directions.
LGNov 16, 2019
What Will Your Child Look Like? DNA-Net: Age and Gender Aware Kin Face SynthesizerPengyu Gao, Siyu Xia, Joseph Robinson et al.
Visual kinship recognition aims to identify blood relatives from facial images. Its practical application-- like in law-enforcement, video surveillance, automatic family album management, and more-- has motivated many researchers to put forth effort on the topic as of recent. In this paper, we focus on a new view of visual kinship technology: kin-based face generation. Specifically, we propose a two-stage kin-face generation model to predict the appearance of a child given a pair of parents. The first stage includes a deep generative adversarial autoencoder conditioned on ages and genders to map between facial appearance and high-level features. The second stage is our proposed DNA-Net, which serves as a transformation between the deep and genetic features based on a random selection process to fuse genes of a parent pair to form the genes of a child. We demonstrate the effectiveness of the proposed method quantitatively and qualitatively: quantitatively, pre-trained models and human subjects perform kinship verification on the generated images of children; qualitatively, we show photo-realistic face images of children that closely resemble the given pair of parents. In the end, experiments validate that the proposed model synthesizes convincing kin-faces using both subjective and objective standards.
CVOct 25, 2019
LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual ConvolutionBin Sun, Jun Li, Ming Shao et al.
Deep learning has become popular in recent years primarily due to the powerful computing device such as GPUs. However, deploying these deep models to end-user devices, smart phones, or embedded systems with limited resources is challenging. To reduce the computation and memory costs, we propose a novel lightweight deep learning module by low-rank pointwise residual (LPR) convolution, called LPRNet. Essentially, LPR aims at using low-rank approximation in pointwise convolution to further reduce the module size, while keeping depthwise convolutions as the residual module to rectify the LPR module. This is critical when the low-rankness undermines the convolution process. We embody our design by replacing modules of identical input-output dimension in MobileNet and ShuffleNetv2. Experiments on visual recognition tasks including image classification and face alignment on popular benchmarks show that our LPRNet achieves competitive performance but with significant reduction of Flops and memory cost compared to the state-of-the-art deep models focusing on model compression.
CVOct 25, 2019
Real-time Memory Efficient Large-pose Face Alignment via Deep Evolutionary NetworkBin Sun, Ming Shao, Siyu Xia et al.
There is an urgent need to apply face alignment in a memory-efficient and real-time manner due to the recent explosion of face recognition applications. However, impact factors such as large pose variation and computational inefficiency, still hinder its broad implementation. To this end, we propose a computationally efficient deep evolutionary model integrated with 3D Diffusion Heap Maps (DHM). First, we introduce a sparse 3D DHM to assist the initial modeling process under extreme pose conditions. Afterward, a simple and effective CNN feature is extracted and fed to Recurrent Neural Network (RNN) for evolutionary learning. To accelerate the model, we propose an efficient network structure to accelerate the evolutionary learning process through a factorization strategy. Extensive experiments on three popular alignment databases demonstrate the advantage of the proposed models over the state-of-the-art, especially under large-pose conditions. Notably, the computational speed of our model is 6 times faster than the state-of-the-art on CPU and 14 times on GPU. We also discuss and analyze the limitations of our models and future research work.
LGSep 28, 2019
Learning Robust Data Representation: A Knowledge Flow PerspectiveZhengming Ding, Ming Shao, Handong Zhao et al.
It is always demanding to learn robust visual representation for various learning problems; however, this learning and maintenance process usually suffers from noise, incompleteness or knowledge domain mismatch. Thus, robust representation learning by removing noisy features or samples, complementing incomplete data, and mitigating the distribution difference becomes the key. Along this line of research, low-rank modeling has been widely-applied to solving representation learning challenges. This survey covers the topic from a knowledge flow perspective in terms of: (1) robust knowledge recovery, (2) robust knowledge transfer, and (3) robust knowledge fusion, centered around several major applications. First of all, we deliver a unified formulation for robust knowledge discovery given single dataset. Second, we discuss robust knowledge transfer and fusion given multiple datasets with different knowledge flows, followed by practical challenges, model variations, and remarks. Finally, we highlight future research of robust knowledge discovery for incomplete, unbalance, large-scale data analysis. This would benefit AI community from literature review to future direction.
MEJan 2, 2019
Cost-sensitive Selection of Variables by Ensemble of Model SequencesDonghui Yan, Zhiwei Qin, Songxiang Gu et al.
Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. This is a fairly new class of problems in the area of cost-sensitive learning. A few attempts have been made to incorporate costs in combining and selecting measures. However, existing studies either do not strictly enforce a budget constraint, or are not the `most' cost effective. With a focus on classification problem, we propose a computationally efficient approach that could find a near optimal model under a given budget by exploring the most `promising' part of the solution space. Instead of outputting a single model, we produce a model schedule -- a list of models, sorted by model costs and expected predictive accuracy. This could be used to choose the model with the best predictive accuracy under a given budget, or to trade off between the budget and the predictive accuracy. Experiments on some benchmark datasets show that our approach compares favorably to competing methods.
CVApr 7, 2016
Families in the Wild (FIW): Large-Scale Kinship Image Database and BenchmarksJoseph P. Robinson, Ming Shao, Yue Wu et al.
We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a single, unified dataset for kinship recognition, we aim to provide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 families with our annotation tool designed to mark complex hierarchical relationships and local label information in a quick and efficient manner. We include several benchmarks for two image-based tasks, kinship verification and family recognition. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an off-the-shelf feature extractor outperforms the other feature types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-trained weights; (2) for family recognition, a family-specific softmax classifier was added to the network.