Richard Jiang

CV
h-index35
29papers
813citations
Novelty36%
AI Score35

29 Papers

AIOct 30, 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions

Luca Longo, Mario Brcic, Federico Cabitza et al.

As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.

CVOct 11, 2022Code
Semantic Segmentation under Adverse Conditions: A Weather and Nighttime-aware Synthetic Data-based Approach

Abdulrahman Kerim, Felipe Chamone, Washington Ramos et al.

Recent semantic segmentation models perform well under standard weather conditions and sufficient illumination but struggle with adverse weather conditions and nighttime. Collecting and annotating training data under these conditions is expensive, time-consuming, error-prone, and not always practical. Usually, synthetic data is used as a feasible data source to increase the amount of training data. However, just directly using synthetic data may actually harm the model's performance under normal weather conditions while getting only small gains in adverse situations. Therefore, we present a novel architecture specifically designed for using synthetic training data for domain adaptation. We propose a simple yet powerful addition to DeepLabV3+ by using weather and time-of-the-day supervisors trained with multi-task learning, making it both weather and nighttime aware, which improves its mIoU accuracy by $14$ percentage points on the ACDC dataset while maintaining a score of $75\%$ mIoU on the Cityscapes dataset. Our code is available at https://github.com/lsmcolab/Semantic-Segmentation-under-Adverse-Conditions.

CVJun 25, 2022
Review on Social Behavior Analysis of Laboratory Animals: From Methodologies to Applications

Ziping Jiang, Paul L. Chazot, Richard Jiang

As the bridge between genetic and physiological aspects, animal behaviour analysis is one of the most significant topics in biology and ecological research. However, identifying, tracking and recording animal behaviour are labour intensive works that require professional knowledge. To mitigate the spend for annotating data, researchers turn to computer vision techniques for automatic label algorithms, since most of the data are recorded visually. In this work, we explore a variety of behaviour detection algorithms, covering traditional vision methods, statistical methods and deep learning methods. The objective of this work is to provide a thorough investigation of related work, furnishing biologists with a scratch of efficient animal behaviour detection methods. Apart from that, we also discuss the strengths and weaknesses of those algorithms to provide some insights for those who already delve into this field.

CVJun 25, 2022
Machine Learning-based Biological Ageing Estimation Technologies: A Survey

Zhaonian Zhang, Richard Jiang, Danny Crookes et al.

In recent years, there are various methods of estimating Biological Age (BA) have been developed. Especially with the development of machine learning (ML), there are more and more types of BA predictions, and the accuracy has been greatly improved. The models for the estimation of BA play an important role in monitoring healthy aging, and could provide new tools to detect health status in the general population and give warnings to sub-healthy people. We will mainly review three age prediction methods by using ML. They are based on blood biomarkers, facial images, and structural neuroimaging features. For now, the model using blood biomarkers is the simplest, most direct, and most accurate method. The face image method is affected by various aspects such as race, environment, etc., the prediction accuracy is not very good, which cannot make a great contribution to the medical field. In summary, we are here to track the way forward in the era of big data for us and other potential general populations and show ways to leverage the vast amounts of data available today.

CVAug 26, 2022
Leveraging Synthetic Data to Learn Video Stabilization Under Adverse Conditions

Abdulrahman Kerim, Washington L. S. Ramos, Leandro Soriano Marcolino et al.

Video stabilization plays a central role to improve videos quality. However, despite the substantial progress made by these methods, they were, mainly, tested under standard weather and lighting conditions, and may perform poorly under adverse conditions. In this paper, we propose a synthetic-aware adverse weather robust algorithm for video stabilization that does not require real data and can be trained only on synthetic data. We also present Silver, a novel rendering engine to generate the required training data with an automatic ground-truth extraction procedure. Our approach uses our specially generated synthetic data for training an affine transformation matrix estimator avoiding the feature extraction issues faced by current methods. Additionally, since no video stabilization datasets under adverse conditions are available, we propose the novel VSAC105Real dataset for evaluation. We compare our method to five state-of-the-art video stabilization algorithms using two benchmarks. Our results show that current approaches perform poorly in at least one weather condition, and that, even training in a small dataset with synthetic data only, we achieve the best performance in terms of stability average score, distortion score, success rate, and average cropping ratio when considering all weather conditions. Hence, our video stabilization model generalizes well on real-world videos and does not require large-scale synthetic training data to converge.

CVJul 9, 2023
Marine Debris Detection in Satellite Surveillance using Attention Mechanisms

Ao Shen, Yijie Zhu, Richard Jiang

Marine debris is an important issue for environmental protection, but current methods for locating marine debris are yet limited. In order to achieve higher efficiency and wider applicability in the localization of Marine debris, this study tries to combine the instance segmentation of YOLOv7 with different attention mechanisms and explores the best model. By utilizing a labelled dataset consisting of satellite images containing ocean debris, we examined three attentional models including lightweight coordinate attention, CBAM (combining spatial and channel focus), and bottleneck transformer (based on self-attention). Box detection assessment revealed that CBAM achieved the best outcome (F1 score of 77%) compared to coordinate attention (F1 score of 71%) and YOLOv7/bottleneck transformer (both F1 scores around 66%). Mask evaluation showed CBAM again leading with an F1 score of 73%, whereas coordinate attention and YOLOv7 had comparable performances (around F1 score of 68%/69%) and bottleneck transformer lagged behind at F1 score of 56%. These findings suggest that CBAM offers optimal suitability for detecting marine debris. However, it should be noted that the bottleneck transformer detected some areas missed by manual annotation and displayed better mask precision for larger debris pieces, signifying potentially superior practical performance.

IVMar 5, 2025Code
ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation

Saqib Qamar, Syed Furqan Qadri, Roobaea Alroobaea et al.

Melanoma is a malignant tumor that originates from skin cell lesions. Accurate and efficient segmentation of skin lesions is essential for quantitative analysis but remains a challenge due to blurred lesion boundaries, gradual color changes, and irregular shapes. To address this, we propose ScaleFusionNet, a hybrid model that integrates a Cross-Attention Transformer Module (CATM) and adaptive fusion block (AFB) to enhance feature extraction and fusion by capturing both local and global features. We introduce CATM, which utilizes Swin transformer blocks and Cross Attention Fusion (CAF) to adaptively refine feature fusion and reduce semantic gaps in the encoder-decoder to improve segmentation accuracy. Additionally, the AFB uses Swin Transformer-based attention and deformable convolution-based adaptive feature extraction to help the model gather local and global contextual information through parallel pathways. This enhancement refines the lesion boundaries and preserves fine-grained details. ScaleFusionNet achieves Dice scores of 92.94\% and 91.80\% on the ISIC-2016 and ISIC-2018 datasets, respectively, demonstrating its effectiveness in skin lesion analysis. Simultaneously, independent validation experiments were conducted on the PH$^2$ dataset using the pretrained model weights. The results show that ScaleFusionNet demonstrates significant performance improvements compared with other state-of-the-art methods. Our code implementation is publicly available at GitHub.

LGDec 6, 2024Code
Multi-Armed Bandit Approach for Optimizing Training on Synthetic Data

Abdulrahman Kerim, Leandro Soriano Marcolino, Erickson R. Nascimento et al.

Supervised machine learning methods require large-scale training datasets to perform well in practice. Synthetic data has been showing great progress recently and has been used as a complement to real data. However, there is yet a great urge to assess the usability of synthetically generated data. To this end, we propose a novel UCB-based training procedure combined with a dynamic usability metric. Our proposed metric integrates low-level and high-level information from synthetic images and their corresponding real and synthetic datasets, surpassing existing traditional metrics. By utilizing a UCB-based dynamic approach ensures continual enhancement of model learning. Unlike other approaches, our method effectively adapts to changes in the machine learning model's state and considers the evolving utility of training samples during the training process. We show that our metric is an effective way to rank synthetic images based on their usability. Furthermore, we propose a new attribute-aware bandit pipeline for generating synthetic data by integrating a Large Language Model with Stable Diffusion. Quantitative results show that our approach can boost the performance of a wide range of supervised classifiers. Notably, we observed an improvement of up to 10% in classification accuracy compared to traditional approaches, demonstrating the effectiveness of our approach. Our source code, datasets, and additional materials are publically available at https://github.com/A-Kerim/Synthetic-Data-Usability-2024.

AIFeb 7, 2024
Advancing Explainable AI Toward Human-Like Intelligence: Forging the Path to Artificial Brain

Yongchen Zhou, Richard Jiang

The intersection of Artificial Intelligence (AI) and neuroscience in Explainable AI (XAI) is pivotal for enhancing transparency and interpretability in complex decision-making processes. This paper explores the evolution of XAI methodologies, ranging from feature-based to human-centric approaches, and delves into their applications in diverse domains, including healthcare and finance. The challenges in achieving explainability in generative models, ensuring responsible AI practices, and addressing ethical implications are discussed. The paper further investigates the potential convergence of XAI with cognitive sciences, the development of emotionally intelligent AI, and the quest for Human-Like Intelligence (HLI) in AI systems. As AI progresses towards Artificial General Intelligence (AGI), considerations of consciousness, ethics, and societal impact become paramount. The ongoing pursuit of deciphering the mysteries of the brain with AI and the quest for HLI represent transformative endeavors, bridging technical advancements with multidisciplinary explorations of human cognition.

CVJan 13, 2024
Triamese-ViT: A 3D-Aware Method for Robust Brain Age Estimation from MRIs

Zhaonian Zhang, Richard Jiang

The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recent studies emphasizing the efficacy of 3D convolutional neural networks (CNNs) like 3D ResNet. However, the untapped potential of Vision Transformers (ViTs), known for their accuracy and interpretability, persists in this domain due to limitations in their 3D versions. This paper introduces Triamese-ViT, an innovative adaptation of the ViT model for brain age estimation. Our model uniquely combines ViTs from three different orientations to capture 3D information, significantly enhancing accuracy and interpretability. Tested on a dataset of 1351 MRI scans, Triamese-ViT achieves a Mean Absolute Error (MAE) of 3.84, a 0.9 Spearman correlation coefficient with chronological age, and a -0.29 Spearman correlation coefficient between the brain age gap (BAG) and chronological age, significantly better than previous methods for brian age estimation. A key innovation of Triamese-ViT is its capacity to generate a comprehensive 3D-like attention map, synthesized from 2D attention maps of each orientation-specific ViT. This feature is particularly beneficial for in-depth brain age analysis and disease diagnosis, offering deeper insights into brain health and the mechanisms of age-related neural changes.

MNAug 6, 2025
Alz-QNet: A Quantum Regression Network for Studying Alzheimer's Gene Interactions

Debanjan Konar, Neerav Sreekumar, Richard Jiang et al.

Understanding the molecular-level mechanisms underpinning Alzheimer's disease (AD) by studying crucial genes associated with the disease remains a challenge. Alzheimer's, being a multifactorial disease, requires understanding the gene-gene interactions underlying it for theranostics and progress. In this article, a novel attempt has been made using a quantum regression to decode how some crucial genes in the AD Amyloid Beta Precursor Protein ($APP$), Sterol regulatory element binding transcription factor 14 ($FGF14$), Yin Yang 1 ($YY1$), and Phospholipase D Family Member 3 ($PLD3$) etc. become influenced by other prominent switching genes during disease progression, which may help in gene expression-based therapy for AD. Our proposed Quantum Regression Network (Alz-QNet) introduces a pioneering approach with insights from the state-of-the-art Quantum Gene Regulatory Networks (QGRN) to unravel the gene interactions involved in AD pathology, particularly within the Entorhinal Cortex (EC), where early pathological changes occur. Using the proposed Alz-QNet framework, we explore the interactions between key genes ($APP$, $FGF14$, $YY1$, $EGR1$, $GAS7$, $AKT3$, $SREBF2$, and $PLD3$) within the CE microenvironment of AD patients, studying genetic samples from the database $GSE138852$, all of which are believed to play a crucial role in the progression of AD. Our investigation uncovers intricate gene-gene interactions, shedding light on the potential regulatory mechanisms that underlie the pathogenesis of AD, which help us to find potential gene inhibitors or regulators for theranostics.

CVJan 18, 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder

Yongchen Zhou, Richard Jiang

In the domain of computer vision, the restoration of missing information in video frames is a critical challenge, particularly in applications such as autonomous driving and surveillance systems. This paper introduces the Siamese Masked Conditional Variational Autoencoder (SiamMCVAE), leveraging a siamese architecture with twin encoders based on vision transformers. This innovative design enhances the model's ability to comprehend lost content by capturing intrinsic similarities between paired frames. SiamMCVAE proficiently reconstructs missing elements in masked frames, effectively addressing issues arising from camera malfunctions through variational inferences. Experimental results robustly demonstrate the model's effectiveness in restoring missing information, thus enhancing the resilience of computer vision systems. The incorporation of Siamese Vision Transformer (SiamViT) encoders in SiamMCVAE exemplifies promising potential for addressing real-world challenges in computer vision, reinforcing the adaptability of autonomous systems in dynamic environments.

CVMar 29, 2022
Self-Supervised Leaf Segmentation under Complex Lighting Conditions

Xufeng Lin, Chang-Tsun Li, Scott Adams et al.

As an essential prerequisite task in image-based plant phenotyping, leaf segmentation has garnered increasing attention in recent years. While self-supervised learning is emerging as an effective alternative to various computer vision tasks, its adaptation for image-based plant phenotyping remains rather unexplored. In this work, we present a self-supervised leaf segmentation framework consisting of a self-supervised semantic segmentation model, a color-based leaf segmentation algorithm, and a self-supervised color correction model. The self-supervised semantic segmentation model groups the semantically similar pixels by iteratively referring to the self-contained information, allowing the pixels of the same semantic object to be jointly considered by the color-based leaf segmentation algorithm for identifying the leaf regions. Additionally, we propose to use a self-supervised color correction model for images taken under complex illumination conditions. Experimental results on datasets of different plant species demonstrate the potential of the proposed self-supervised framework in achieving effective and generalizable leaf segmentation.

AIMay 16, 2021
Private Facial Diagnosis as an Edge Service for Parkinson's DBS Treatment Valuation

Richard Jiang, Paul Chazot, Danny Crookes et al.

Facial phenotyping has recently been successfully exploited for medical diagnosis as a novel way to diagnose a range of diseases, where facial biometrics has been revealed to have rich links to underlying genetic or medical causes. In this paper, taking Parkinson's Diseases (PD) as a case study, we proposed an Artificial-Intelligence-of-Things (AIoT) edge-oriented privacy-preserving facial diagnosis framework to analyze the treatment of Deep Brain Stimulation (DBS) on PD patients. In the proposed framework, a new edge-based information theoretically secure framework is proposed to implement private deep facial diagnosis as a service over a privacy-preserving AIoT-oriented multi-party communication scheme, where partial homomorphic encryption (PHE) is leveraged to enable privacy-preserving deep facial diagnosis directly on encrypted facial patterns. In our experiments with a collected facial dataset from PD patients, for the first time, we demonstrated that facial patterns could be used to valuate the improvement of PD patients undergoing DBS treatment. We further implemented a privacy-preserving deep facial diagnosis framework that can achieve the same accuracy as the non-encrypted one, showing the potential of our privacy-preserving facial diagnosis as an trustworthy edge service for grading the severity of PD in patients.

MLFeb 12, 2021
Robust and integrative Bayesian neural networks for likelihood-free parameter inference

Fredrik Wrede, Robin Eriksson, Richard Jiang et al.

State-of-the-art neural network-based methods for learning summary statistics have delivered promising results for simulation-based likelihood-free parameter inference. Existing approaches require density estimation as a post-processing step building upon deterministic neural networks, and do not take network prediction uncertainty into account. This work proposes a robust integrated approach that learns summary statistics using Bayesian neural networks, and directly estimates the posterior density using categorical distributions. An adaptive sampling scheme selects simulation locations to efficiently and iteratively refine the predictive posterior of the network conditioned on observations. This allows for more efficient and robust convergence on comparatively large prior spaces. We demonstrate our approach on benchmark examples and compare against related methods.

CVOct 16, 2020
Deep Learning based Automated Forest Health Diagnosis from Aerial Images

Chia-Yen Chiang, Chloe Barnes, Plamen Angelov et al.

Global climate change has had a drastic impact on our environment. Previous study showed that pest disaster occured from global climate change may cause a tremendous number of trees died and they inevitably became a factor of forest fire. An important portent of the forest fire is the condition of forests. Aerial image-based forest analysis can give an early detection of dead trees and living trees. In this paper, we applied a synthetic method to enlarge imagery dataset and present a new framework for automated dead tree detection from aerial images using a re-trained Mask RCNN (Mask Region-based Convolutional Neural Network) approach, with a transfer learning scheme. We apply our framework to our aerial imagery datasets,and compare eight fine-tuned models. The mean average precision score (mAP) for the best of these models reaches 54%. Following the automated detection, we are able to automatically produce and calculate number of dead tree masks to label the dead trees in an image, as an indicator of forest health that could be linked to the causal analysis of environmental changes and the predictive likelihood of forest fire.

HCMay 16, 2020
A Deep Learning based Wearable Healthcare IoT Device for AI-enabled Hearing Assistance Automation

Fraser Young, L Zhang, Richard Jiang et al.

With the recent booming of artificial intelligence (AI), particularly deep learning techniques, digital healthcare is one of the prevalent areas that could gain benefits from AI-enabled functionality. This research presents a novel AI-enabled Internet of Things (IoT) device operating from the ESP-8266 platform capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations. In the proposed solution, a server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people, to enable and assist conversation as normal with the general population. Furthermore, in order to raise alert of traffic or dangerous scenarios, an 'urban-emergency' classifier is developed using a deep learning model, Inception-v4, with transfer learning to detect/recognize alerting/alarming sounds, such as a horn sound or a fire alarm, with texts generated to alert the prospective user. The training of Inception-v4 was carried out on a consumer desktop PC and then implemented into the AI based IoT application. The empirical results indicate that the developed prototype system achieves an accuracy rate of 92% for sound recognition and classification with real-time performance.

SPMay 4, 2020
3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

David Lonsdale, Li Zhang, Richard Jiang

In this paper, we present our work on developing robot arm prosthetic via deep learning. Our work proposes to use transfer learning techniques applied to the Google Inception model to retrain the final layer for surface electromyography (sEMG) classification. Data have been collected using the Thalmic Labs Myo Armband and used to generate graph images comprised of 8 subplots per image containing sEMG data captured from 40 data points per sensor, corresponding to the array of 8 sEMG sensors in the armband. Data captured were then classified into four categories (Fist, Thumbs Up, Open Hand, Rest) via using a deep learning model, Inception-v3, with transfer learning to train the model for accurate prediction of each on real-time input of new data. This trained model was then downloaded to the ARM processor based embedding system to enable the brain-controlled robot-arm prosthetic manufactured from our 3D printer. Testing of the functionality of the method, a robotic arm was produced using a 3D printer and off-the-shelf hardware to control it. SSH communication protocols are employed to execute python files hosted on an embedded Raspberry Pi with ARM processors to trigger movement on the robot arm of the predicted gesture.

CVApr 27, 2020
In-Vehicle Object Detection in the Wild for Driverless Vehicles

Ranjith Dinakaran, Li Zhang, Richard Jiang

In-vehicle human object identification plays an important role in vision-based automated vehicle driving systems while objects such as pedestrians and vehicles on roads or streets are the primary targets to protect from driverless vehicles. A challenge is the difficulty to detect objects in moving under the wild conditions, while illumination and image quality could drastically vary. In this work, to address this challenge, we exploit Deep Convolutional Generative Adversarial Networks (DCGANs) with Single Shot Detector (SSD) to handle with the wild conditions. In our work, a GAN was trained with low-quality images to handle with the challenges arising from the wild conditions in smart cities, while a cascaded SSD is employed as the object detector to perform with the GAN. We used tested our approach under wild conditions using taxi driver videos on London street in both daylight and night times, and the tests from in-vehicle videos demonstrate that this strategy can drastically achieve a better detection rate under the wild conditions.

CRSep 14, 2019
Biometric Blockchain: A Secure Solution for Intelligent Vehicle Data Sharing

Bing Xu, Tobechukwu Agbele, Qiang Ni et al.

The intelligent vehicle (IV) has become a promising technology that could revolutionize our life in smart cities sooner or later. However, it yet suffers from many security vulnerabilities. Traditional security methods are incapable to secure the IV data sharing against malicious attacks. Blockchain, as expected by both research and industry communities, has emerged as a good solution to address these issues. The major issues in IV data sharing are trust, data accuracy and reliability of data sharing in the communication channel. Blockchain technology, previously working for the cryptocurrency, has recently applied to build trust and reliability in peer-to-peer networks with similar topologies of IV data sharing. In this chapter, we present a new framework, namely biometric blockchain (BBC), for secure IV data sharing. In our new scheme, biometric information is exploited as a cue to record who is responsible in the data sharing activities, while the proposed BBC technology serves as the backbone of the IV data-sharing architecture. Hence, the proposed BBC technology provides a more reliable trust environment between the vehicles while personal identities are traceable in the proposed new scheme.

CVSep 11, 2019
Automated Blood Cell Detection and Counting via Deep Learning for Microfluidic Point-of-Care Medical Devices

Tiancheng Xia, Richard Jiang, YongQing Fu et al.

Automated in-vitro cell detection and counting have been a key theme for artificial and intelligent biological analysis such as biopsy, drug analysis and decease diagnosis. Along with the rapid development of microfluidics and lab-on-chip technologies, in-vitro live cell analysis has been one of the critical tasks for both research and industry communities. However, it is a great challenge to obtain and then predict the precise information of live cells from numerous microscopic videos and images. In this paper, we investigated in-vitro detection of white blood cells using deep neural networks, and discussed how state-of-the-art machine learning techniques could fulfil the needs of medical diagnosis. The approach we used in this study was based on Faster Region-based Convolutional Neural Networks (Faster RCNNs), and a transfer learning process was applied to apply this technique to the microscopic detection of blood cells. Our experimental results demonstrated that fast and efficient analysis of blood cells via automated microscopic imaging can achieve much better accuracy and faster speed than the conventionally applied methods, implying a promising future of this technology to be applied to the microfluidic point-of-care medical devices.

CVSep 11, 2019
Computer-Aided Automated Detection of Gene-Controlled Social Actions of Drosophila

Khan Faraz, Ahmed Bouridane, Richard Jiang et al.

Gene expression of social actions in Drosophilae has been attracting wide interest from biologists, medical scientists and psychologists. Gene-edited Drosophilae have been used as a test platform for experimental investigation. For example, Parkinson's genes can be embedded into a group of newly bred Drosophilae for research purpose. However, human observation of numerous tiny Drosophilae for a long term is an arduous work, and the dependence on human's acute perception is highly unreliable. As a result, an automated system of social action detection using machine learning has been highly demanded. In this study, we propose to automate the detection and classification of two innate aggressive actions demonstrated by Drosophilae. Robust keypoint detection is achieved using selective spatio-temporal interest points (sSTIP) which are then described using the 3D Scale Invariant Feature Transform (3D-SIFT) descriptors. Dimensionality reduction is performed using Spectral Regression Kernel Discriminant Analysis (SR-KDA) and classification is done using the nearest centre rule. The classification accuracy shown demonstrates the feasibility of the proposed system.

LGSep 5, 2019
Atypical Facial Landmark Localisation with Stacked Hourglass Networks: A Study on 3D Facial Modelling for Medical Diagnosis

Gary Storey, Ahmed Bouridane, Richard Jiang et al.

While facial biometrics has been widely used for identification purpose, it has recently been researched as medical biometrics for a range of diseases. In this chapter, we investigate the facial landmark detection for atypical 3D facial modelling in facial palsy cases, while potentially such modelling can assist the medical diagnosis using atypical facial features. In our work, a study of landmarks localisation methods such as stacked hourglass networks is conducted and evaluated to ascertain their accuracy when presented with unseen atypical faces. The evaluation highlights that the state-of-the-art stacked hourglass architecture outperforms other traditional methods.

CVJul 21, 2019
Shallow Unorganized Neural Networks using Smart Neuron Model for Visual Perception

Richard Jiang, Danny Crookes

The recent success of Deep Neural Networks (DNNs) has revealed the significant capability of neural computing in many challenging applications. Although DNNs are derived from emulating biological neurons, there still exist doubts over whether or not DNNs are the final and best model to emulate the mechanism of human intelligence. In particular, there are two discrepancies between computational DNN models and the observed facts of biological neurons. First, human neurons are interconnected randomly, while DNNs need carefully-designed architectures to work properly. Second, human neurons usually have a long spiking latency (~100ms) which implies that not many layers can be involved in making a decision, while DNNs could have hundreds of layers to guarantee high accuracy. In this paper, we propose a new computational model, namely shallow unorganized neural networks (SUNNs), in contrast to ANNs/DNNs. The proposed SUNNs differ from standard ANNs or DNNs in three fundamental aspects: 1) SUNNs are based on an adaptive neuron cell model, Smart Neurons, that allows each artificial neuron cell to adaptively respond to its inputs rather than carrying out a fixed weighted-sum operation like the classic neuron model in ANNs/DNNs; 2) SUNNs can cope with computational tasks with very shallow architectures; 3) SUNNs have a natural topology with random interconnections, as the human brain does, and as proposed by Turing's B-type unorganized machines. We implemented the proposed SUNN architecture and tested it on a number of unsupervised early stage visual perception tasks. Surprisingly, such simple shallow architectures achieved very good results in our experiments. The success of our new computational model makes it the first workable example of Turing's B-Type unorganized machine that can achieve comparable or better performance against the state-of-the-art algorithms.

CRJul 21, 2019
Biometric Blockchain: A Better Solution for the Security and Trust of Food Logistics

Bing Xu, Tobechukwu Agbele, Richard Jiang

Blockchain has been emerging as a promising technology that could totally change the landscape of data security in the coming years, particularly for data access over Internet-of-Things and cloud servers. However, blockchain itself, though secured by its protocol, does not identify who owns the data and who uses the data. Other than simply encrypting data into keys, in this paper, we proposed a protocol called Biometric Blockchain (BBC) that explicitly incorporate the biometric cues of individuals to unambiguously identify the creators and users in a blockchain-based system, particularly to address the increasing needs to secure the food logistics, following the recently widely reported incident on wrongly labelled foods that caused the death of a customer on a flight. The advantage of using BBC in the food logistics is clear: it can not only identify if the data or labels are authentic, but also clearly record who is responsible for the secured data or labels. As a result, such a BBC-based solution can great ease the difficulty to control the risks accompanying the food logistics, such as faked foods or wrong gradient labels.

CVMay 31, 2019
3DPalsyNet: A Facial Palsy Grading and Motion Recognition Framework using Fully 3D Convolutional Neural Networks

Gary Storey, Richard Jiang, Shelagh Keogh et al.

The capability to perform facial analysis from video sequences has significant potential to positively impact in many areas of life. One such area relates to the medical domain to specifically aid in the diagnosis and rehabilitation of patients with facial palsy. With this application in mind, this paper presents an end-to-end framework, named 3DPalsyNet, for the tasks of mouth motion recognition and facial palsy grading. 3DPalsyNet utilizes a 3D CNN architecture with a ResNet backbone for the prediction of these dynamic tasks. Leveraging transfer learning from a 3D CNNs pre-trained on the Kinetics data set for general action recognition, the model is modified to apply joint supervised learning using center and softmax loss concepts. 3DPalsyNet is evaluated on a test set consisting of individuals with varying ranges of facial palsy and mouth motions and the results have shown an attractive level of classification accuracy in these task of 82% and 86% respectively. The frame duration and the loss function affect was studied in terms of the predictive qualities of the proposed 3DPalsyNet, where it was found shorter frame duration's of 8 performed best for this specific task. Centre loss and softmax have shown improvements in spatio-temporal feature learning than softmax loss alone, this is in agreement with earlier work involving the spatial domain.

CVMay 29, 2019
Distant Pedestrian Detection in the Wild using Single Shot Detector with Deep Convolutional Generative Adversarial Networks

Ranjith Dinakaran, Philip Easom, Li Zhang et al.

In this work, we examine the feasibility of applying Deep Convolutional Generative Adversarial Networks (DCGANs) with Single Shot Detector (SSD) as data-processing technique to handle with the challenge of pedestrian detection in the wild. Specifically, we attempted to use in-fill completion (where a portion of the image is masked) to generate random transformations of images with portions missing to expand existing labelled datasets. In our work, GAN has been trained intensively on low resolution images, in order to neutralize the challenges of the pedestrian detection in the wild, and considered humans, and few other classes for detection in smart cities. The object detector experiment performed by training GAN model along with SSD provided a substantial improvement in the results. This approach presents a very interesting overview in the current state of art on GAN networks for object detection. We used Canadian Institute for Advanced Research (CIFAR), Caltech, KITTI data set for training and testing the network under different resolutions and the experimental results with comparison been showedbetween DCGAN cascaded with SSD and SSD itself.

CVNov 18, 2018
Deep Learning based Pedestrian Detection at Distance in Smart Cities

Ranjith K Dinakaran, Philip Easom, Ahmed Bouridane et al.

Generative adversarial networks (GANs) have been promising for many computer vision problems due to their powerful capabilities to enhance the data for training and test. In this paper, we leveraged GANs and proposed a new architecture with a cascaded Single Shot Detector (SSD) for pedestrian detection at distance, which is yet a challenge due to the varied sizes of pedestrians in videos at distance. To overcome the low-resolution issues in pedestrian detection at distance, DCGAN is employed to improve the resolution first to reconstruct more discriminative features for a SSD to detect objects in images or videos. A crucial advantage of our method is that it learns a multi-scale metric to distinguish multiple objects at different distances under one image, while DCGAN serves as an encoder-decoder platform to generate parts of an image that contain better discriminative information. To measure the effectiveness of our proposed method, experiments were carried out on the Canadian Institute for Advanced Research (CIFAR) dataset, and it was demonstrated that the proposed new architecture achieved a much better detection rate, particularly on vehicles and pedestrians at distance, making it highly suitable for smart cities applications that need to discover key objects or pedestrians at distance.