MMApr 30, 2023
Interpretability of Machine Learning: Recent Advances and Future ProspectsLei Gao, Ling Guan
The proliferation of machine learning (ML) has drawn unprecedented interest in the study of various multimedia contents such as text, image, audio and video, among others. Consequently, understanding and learning ML-based representations have taken center stage in knowledge discovery in intelligent multimedia research and applications. Nevertheless, the black-box nature of contemporary ML, especially in deep neural networks (DNNs), has posed a primary challenge for ML-based representation learning. To address this black-box problem, the studies on interpretability of ML have attracted tremendous interests in recent years. This paper presents a survey on recent advances and future prospects on interpretability of ML, with several application examples pertinent to multimedia computing, including text-image cross-modal representation learning, face recognition, and the recognition of objects. It is evidently shown that the study of interpretability of ML promises an important research direction, one which is worth further investment in.
CVOct 28, 2021
ODMTCNet: An Interpretable Multi-view Deep Neural Network Architecture for Image Feature RepresentationLei Gao, Zheng Guo, Ling Guan
This work proposes an interpretable multi-view deep neural network architecture, namely optimal discriminant multi-view tensor convolutional network (ODMTCNet), by integrating statistical machine learning (SML) principles with the deep neural network (DNN) architecture.
LGJul 21, 2021
ECG Heartbeat Classification Using Multimodal FusionZeeshan Ahmad, Anika Tabassum, Ling Guan et al.
Electrocardiogram (ECG) is an authoritative source to diagnose and counter critical cardiovascular syndromes such as arrhythmia and myocardial infarction (MI). Current machine learning techniques either depend on manually extracted features or large and complex deep learning networks which merely utilize the 1D ECG signal directly. Since intelligent multimodal fusion can perform at the stateof-the-art level with an efficient deep network, therefore, in this paper, we propose two computationally efficient multimodal fusion frameworks for ECG heart beat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). At the input of these frameworks, we convert the raw ECG data into three different images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). In MIF, we first perform image fusion by combining three imaging modalities to create a single image modality which serves as input to the Convolutional Neural Network (CNN). In MFF, we extracted features from penultimate layer of CNNs and fused them to get unique and interdependent information necessary for better performance of classifier. These informational features are finally used to train a Support Vector Machine (SVM) classifier for ECG heart-beat classification. We demonstrate the superiority of the proposed fusion models by performing experiments on PhysioNets MIT-BIH dataset for five distinct conditions of arrhythmias which are consistent with the AAMI EC57 protocols and on PTB diagnostics dataset for Myocardial Infarction (MI) classification. We achieved classification accuracy of 99.7% and 99.2% on arrhythmia and MI classification, respectively.
SPMay 28, 2021
ECG Heart-beat Classification Using Multimodal Image FusionZeeshan Ahmad, Anika Tabassum, Naimul Khan et al.
In this paper, we present a novel Image Fusion Model (IFM) for ECG heart-beat classification to overcome the weaknesses of existing machine learning techniques that rely either on manual feature extraction or direct utilization of 1D raw ECG signal. At the input of IFM, we first convert the heart beats of ECG into three different images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF) and then fuse these images to create a single imaging modality. We use AlexNet for feature extraction and classification and thus employ end to end deep learning. We perform experiments on PhysioNet MIT-BIH dataset for five different arrhythmias in accordance with the AAMI EC57 standard and on PTB diagnostics dataset for myocardial infarction (MI) classification. We achieved an state of an art results in terms of prediction accuracy, precision and recall.
LGMar 9, 2021
A Discriminative Vectorial Framework for Multi-modal Feature RepresentationLei Gao, Ling Guan
Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these multi-modal data sources has quickly become a necessity. In this paper, a discriminative vectorial framework is proposed for multi-modal feature representation in knowledge discovery by employing multi-modal hashing (MH) and discriminative correlation maximization (DCM) analysis. Specifically, the proposed framework is capable of minimizing the semantic similarity among different modalities by MH and exacting intrinsic discriminative representations across multiple data sources by DCM analysis jointly, enabling a novel vectorial framework of multi-modal feature representation. Moreover, the proposed feature representation strategy is analyzed and further optimized based on canonical and non-canonical cases, respectively. Consequently, the generated feature representation leads to effective utilization of the input data sources of high quality, producing improved, sometimes quite impressive, results in various applications. The effectiveness and generality of the proposed framework are demonstrated by utilizing classical features and deep neural network (DNN) based features with applications to image and multimedia analysis and recognition tasks, including data visualization, face recognition, object recognition; cross-modal (text-image) recognition and audio emotion recognition. Experimental results show that the proposed solutions are superior to state-of-the-art statistical machine learning (SML) and DNN algorithms.
LGFeb 28, 2021
A Complete Discriminative Tensor Representation Learning for Two-Dimensional Correlation AnalysisLei Gao, Ling Guan
As an effective tool for two-dimensional data analysis, two-dimensional canonical correlation analysis (2DCCA) is not only capable of preserving the intrinsic structural information of original two-dimensional (2D) data, but also reduces the computational complexity effectively. However, due to the unsupervised nature, 2DCCA is incapable of extracting sufficient discriminatory representations, resulting in an unsatisfying performance. In this letter, we propose a complete discriminative tensor representation learning (CDTRL) method based on linear correlation analysis for analyzing 2D signals (e.g. images). This letter shows that the introduction of the complete discriminatory tensor representation strategy provides an effective vehicle for revealing, and extracting the discriminant representations across the 2D data sets, leading to improved results. Experimental results show that the proposed CDTRL outperforms state-of-the-art methods on the evaluated data sets.
LGFeb 28, 2021
Discriminative Multiple Canonical Correlation Analysis for Information FusionLei Gao, Lin Qi, Enqing Chen et al.
In this paper, we propose the Discriminative Multiple Canonical Correlation Analysis (DMCCA) for multimodal information analysis and fusion. DMCCA is capable of extracting more discriminative characteristics from multimodal information representations. Specifically, it finds the projected directions which simultaneously maximize the within-class correlation and minimize the between-class correlation, leading to better utilization of the multimodal information. In the process, we analytically demonstrate that the optimally projected dimension by DMCCA can be quite accurately predicted, leading to both superior performance and substantial reduction in computational cost. We further verify that Canonical Correlation Analysis (CCA), Multiple Canonical Correlation Analysis (MCCA) and Discriminative Canonical Correlation Analysis (DCCA) are special cases of DMCCA, thus establishing a unified framework for Canonical Correlation Analysis. We implement a prototype of DMCCA to demonstrate its performance in handwritten digit recognition and human emotion recognition. Extensive experiments show that DMCCA outperforms the traditional methods of serial fusion, CCA, MCCA and DCCA.
CVFeb 28, 2021
The Labeled Multiple Canonical Correlation Analysis for Information FusionLei Gao, Rui Zhang, Lin Qi et al.
The objective of multimodal information fusion is to mathematically analyze information carried in different sources and create a new representation which will be more effectively utilized in pattern recognition and other multimedia information processing tasks. In this paper, we introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA). By incorporating class label information of the training samples,the proposed LMCCA ensures that the fused features carry discriminative characteristics of the multimodal information representations, and are capable of providing superior recognition performance. We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition utilizing multiple features,bimodal human emotion recognition involving information from both audio and visual domains. The generic nature of LMCCA allows it to take as input features extracted by any means,including those by deep learning (DL) methods. Experimental results show that the proposed method enhanced the performance of both statistical machine learning (SML) methods, and methods based on DL.
CVFeb 27, 2021
Online Behavioral Analysis with Application to Emotion State IdentificationLei Gao, Lin Qi, Ling Guan
In this paper, we propose a novel discriminative model for online behavioral analysis with application to emotion state identification. The proposed model is able to extract more discriminative characteristics from behavioral data effectively and find the direction of optimal projection efficiently to satisfy requirements of online data analysis, leading to better utilization of the behavioral information to produce more accurate recognition results.
CVOct 13, 2020
A Scale and Rotational Invariant Key-point Detector based on Sparse CodingThanh Hong-Phuoc, Ling Guan
Most popular hand-crafted key-point detectors such as Harris corner, SIFT, SURF aim to detect corners, blobs, junctions or other human defined structures in images. Though being robust with some geometric transformations, unintended scenarios or non-uniform lighting variations could significantly degrade their performance. Hence, a new detector that is flexible with context change and simultaneously robust with both geometric and non-uniform illumination variations is very desirable. In this paper, we propose a solution to this challenging problem by incorporating Scale and Rotation Invariant design (named SRI-SCK) into a recently developed Sparse Coding based Key-point detector (SCK). The SCK detector is flexible in different scenarios and fully invariant to affine intensity change, yet it is not designed to handle images with drastic scale and rotation changes. In SRI-SCK, the scale invariance is implemented with an image pyramid technique while the rotation invariance is realized by combining multiple rotated versions of the dictionary used in the sparse coding step of SCK. Techniques for calculation of key-points' characteristic scales and their sub-pixel accuracy positions are also proposed. Experimental results on three public datasets demonstrate that significantly high repeatability and matching score are achieved.
CVJul 12, 2020
Locality Guided Neural Networks for Explainable Artificial IntelligenceRandy Tan, Naimul Khan, Ling Guan
In current deep network architectures, deeper layers in networks tend to contain hundreds of independent neurons which makes it hard for humans to understand how they interact with each other. By organizing the neurons by correlation, humans can observe how clusters of neighbouring neurons interact with each other. In this paper, we propose a novel algorithm for back propagation, called Locality Guided Neural Network(LGNN) for training networks that preserves locality between neighbouring neurons within each layer of a deep network. Heavily motivated by Self-Organizing Map (SOM), the goal is to enforce a local topology on each layer of a deep network such that neighbouring neurons are highly correlated with each other. This method contributes to the domain of Explainable Artificial Intelligence (XAI), which aims to alleviate the black-box nature of current AI methods and make them understandable by humans. Our method aims to achieve XAI in deep learning without changing the structure of current models nor requiring any post processing. This paper focuses on Convolutional Neural Networks (CNNs), but can theoretically be applied to any type of deep learning architecture. In our experiments, we train various VGG and Wide ResNet (WRN) networks for image classification on CIFAR100. In depth analyses presenting both qualitative and quantitative results demonstrate that our method is capable of enforcing a topology on each layer while achieving a small increase in classification accuracy
CVFeb 13, 2019
Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and BeyondNaimul Mefraz Khan, Nabila Abraham, Ling Guan
In this paper, we highlight three issues that limit performance of machine learning on biomedical images, and tackle them through 3 case studies: 1) Interactive Machine Learning (IML): we show how IML can drastically improve exploration time and quality of direct volume rendering. 2) transfer learning: we show how transfer learning along with intelligent pre-processing can result in better Alzheimer's diagnosis using a much smaller training set 3) data imbalance: we show how our novel focal Tversky loss function can provide better segmentation results taking into account the imbalanced nature of segmentation datasets. The case studies are accompanied by in-depth analytical discussion of results with possible future directions.
CVFeb 7, 2018
SCK: A sparse coding based key-point detectorThanh Hong-Phuoc, Yifeng He, Ling Guan
All current popular hand-crafted key-point detectors such as Harris corner, MSER, SIFT, SURF... rely on some specific pre-designed structures for the detection of corners, blobs, or junctions in an image. In this paper, a novel sparse coding based key-point detector which requires no particular pre-designed structures is presented. The key-point detector is based on measuring the complexity level of each block in an image to decide where a key-point should be. The complexity level of a block is defined as the total number of non-zero components of a sparse representation of that block. Generally, a block constructed with more components is more complex and has greater potential to be a good key-point. Experimental results on Webcam and EF datasets [1, 2] show that the proposed detector achieves significantly high repeatability compared to hand-crafted features, and even outperforms the matching scores of the state-of-the-art learning based detector.
GRNov 29, 2017
A Novel Image-centric Approach Towards Direct Volume RenderingNaimul Khan, Riadh Ksantini, Ling Guan
Transfer Function (TF) generation is a fundamental problem in Direct Volume Rendering (DVR). A TF maps voxels to color and opacity values to reveal inner structures. Existing TF tools are complex and unintuitive for the users who are more likely to be medical professionals than computer scientists. In this paper, we propose a novel image-centric method for TF generation where instead of complex tools, the user directly manipulates volume data to generate DVR. The user's work is further simplified by presenting only the most informative volume slices for selection. Based on the selected parts, the voxels are classified using our novel Sparse Nonparametric Support Vector Machine classifier, which combines both local and near-global distributional information of the training data. The voxel classes are mapped to aesthetically pleasing and distinguishable color and opacity values using harmonic colors. Experimental results on several benchmark datasets and a detailed user survey show the effectiveness of the proposed method.
MMNov 29, 2017
Real-Time System for Human Activity AnalysisRandy Tan, Naimul Khan, Ling Guan
We propose a real-time human activity analysis system, where a user's activity can be quantiatively evaluated with respect to a ground truth recording. We use two Kinects to solve the ptorblem of self-occlusion through extraction optimal joint positions using Singular Value Decomposition (SVD) and Sequential Quadratic Programming (SQP). Incremental Dynamic Time Warping (IDTW) is used to compare the user and expert (ground truth) to quantiatively score the user's performance. Furthermore, the user's performance is displayed through a visual feedback system, where colors on the skeleton represent the user's score. Our experiements use a motion capture suit as ground truth to compare our dual Kinect setup to a single Kinect. We also show that with out visual feedback method, users gain statistically significant boost to learning as opposed to watching a simple video.