Quan Bai

h-index23

21papers

109citations

Novelty43%

AI Score37

Ranked #95,396 of 194,257 authors (top 49%)#32,030 in CV (top 54%)

21 Papers

10.1CVJul 6, 2022

A Comprehensive Review on Deep Supervision: Theories and Applications

Renjie Li, Xinyi Wang, Guan Huang et al.

Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network. This technique has been increasingly applied in deep neural network learning systems for various computer vision applications recently. There is a consensus that deep supervision helps improve neural network performance by alleviating the gradient vanishing problem, as one of the many strengths of deep supervision. Besides, in different computer vision applications, deep supervision can be applied in different ways. How to make the most use of deep supervision to improve network performance in different applications has not been thoroughly investigated. In this paper, we provide a comprehensive in-depth review of deep supervision in both theories and applications. We propose a new classification of different deep supervision networks, and discuss advantages and limitations of current deep supervision networks in computer vision applications.

1.5CVJan 18, 2023

Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

Renjie Li, Chun Yu Lao, Rebecca St. George et al.

Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are used. Materials and Methods We applied RMT to finger tapping, a well-validated test of motor control that is one of the most challenging human motions to track with computer vision due to the small keypoints of digits and the high velocities that are generated. We recorded 160 finger tapping assessments simultaneously with a standard 2D laptop camera (30 frames/sec) and a high-speed wearable sensor-based 3D motion tracking system (250 frames/sec). RMT and a range of DLC models were applied to the video data with tapping frequencies up to 8Hz to extract movement features. Results The movement features (e.g. speed, rhythm, variance) identified with the new RMT system exhibited very high concurrent validity with the gold-standard measurements (97.3\% of RMT measures were within +/-0.5Hz of the Optotrak measures), and outperformed DLC and other advanced computer vision tools (around 88.2\% of DLC measures were within +/-0.5Hz of the Optotrak measures). RMT also accurately tracked a range of other rapid human movements such as foot tapping, head turning and sit-to -stand movements. Conclusion: With the ubiquity of video technology in smart devices, the RMT method holds potential to transform access and accuracy of human movement assessment.

10.1CVOct 18, 2022

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

Ruijun Li, Weihua Li, Yi Yang et al.

Recently, diffusion models have been proven to perform remarkably well in text-to-image synthesis tasks in a number of studies, immediately presenting new study opportunities for image generation. Google's Imagen follows this research trend and outperforms DALLE2 as the best model for text-to-image generation. However, Imagen merely uses a T5 language model for text processing, which cannot ensure learning the semantic information of the text. Furthermore, the Efficient UNet leveraged by Imagen is not the best choice in image processing. To address these issues, we propose the Swinv2-Imagen, a novel text-to-image diffusion model based on a Hierarchical Visual Transformer and a Scene Graph incorporating a semantic layout. In the proposed model, the feature vectors of entities and relationships are extracted and involved in the diffusion model, effectively improving the quality of generated images. On top of that, we also introduce a Swin-Transformer-based UNet architecture, called Swinv2-Unet, which can address the problems stemming from the CNN convolution operations. Extensive experiments are conducted to evaluate the performance of the proposed model by using three real-world datasets, i.e., MSCOCO, CUB and MM-CelebA-HQ. The experimental results show that the proposed Swinv2-Imagen model outperforms several popular state-of-the-art methods.

3.3SIMar 17, 2022Code

GAC: A Deep Reinforcement Learning Model Toward User Incentivization in Unknown Social Networks

Shiqing Wu, Weihua Li, Quan Bai

In recent years, many applications have deployed incentive mechanisms to promote users' attention and engagement. Most incentive mechanisms determine specific incentive values based on users' attributes (e.g., preferences), while such information is unavailable in many real-world applications. Meanwhile, due to budget restrictions, realizing successful incentivization for all users can be challenging to complete. In this light, we consider leveraging social influence to maximize the incentivization result. We can directly incentivize influential users to affect more users, so the cost of incentivizing these users can be decreased. However, identifying influential users in a social network requires complete information about influence strength among users, which is impractical to acquire in real-world situations. In this research, we propose an end-to-end reinforcement learning-based framework, called Geometric Actor-Critic (GAC), to tackle the abovementioned problem. The proposed approach can realize effective incentive allocation without having prior knowledge about users' attributes. Three real-world social network datasets have been adopted in the experiments to evaluate the performance of GAC. The experimental results indicate that GAC can learn and apply effective incentive allocation policies in unknown social networks and outperform existing incentive allocation approaches.

8.6HCNov 19, 2022

A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

Yi Yang, Zhong-Qiu Zhao, Quan Bai et al.

Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to jointly and iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space and time complexities. In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network. From the dynamic model we derive two light-weight algorithms, LA\textsuperscript{onepass} and LA\textsuperscript{twopass}, which can effectively and efficiently estimate worker qualities and true labels by traversing all the labels at most twice. Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data. We theoretically prove the convergence property of the proposed algorithms, and bound the error of estimated worker qualities. We also analyze the space and time complexities of the proposed algorithms and show that they are equivalent to those of majority voting. Experiments conducted on 20 real-world datasets demonstrate that the proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings even if they traverse all the labels at most twice.

0.5CLMar 1, 2023

Soft Prompt Guided Joint Learning for Cross-Domain Sentiment Analysis

Jingli Shi, Weihua Li, Quan Bai et al.

Aspect term extraction is a fundamental task in fine-grained sentiment analysis, which aims at detecting customer's opinion targets from reviews on product or service. The traditional supervised models can achieve promising results with annotated datasets, however, the performance dramatically decreases when they are applied to the task of cross-domain aspect term extraction. Existing cross-domain transfer learning methods either directly inject linguistic features into Language models, making it difficult to transfer linguistic knowledge to target domain, or rely on the fixed predefined prompts, which is time-consuming to construct the prompts over all potential aspect term spans. To resolve the limitations, we propose a soft prompt-based joint learning method for cross domain aspect term extraction in this paper. Specifically, by incorporating external linguistic features, the proposed method learn domain-invariant representations between source and target domains via multiple objectives, which bridges the gap between domains with varied distributions of aspect terms. Further, the proposed method interpolates a set of transferable soft prompts consisted of multiple learnable vectors that are beneficial to detect aspect terms in target domain. Extensive experiments are conducted on the benchmark datasets and the experimental results demonstrate the effectiveness of the proposed method for cross-domain aspect terms extraction.

1.8LGOct 31, 2022

Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Wenli Yang, Guan Huang, Renjie Li et al.

Convolutional neural network (CNN) models have seen advanced improvements in performance in various domains, but lack of interpretability is a major barrier to assurance and regulation during operation for acceptance and deployment of AI-assisted applications. There have been many works on input interpretability focusing on analyzing the input-output relations, but the internal logic of models has not been clarified in the current mainstream interpretability methods. In this study, we propose a novel hybrid CNN-interpreter through: (1) An original forward propagation mechanism to examine the layer-specific prediction results for local interpretability. (2) A new global interpretability that indicates the feature correlation and filter importance effects. By combining the local and global interpretabilities, hybrid CNN-interpreter enables us to have a solid understanding and monitoring of model context during the whole learning process with detailed and consistent representations. Finally, the proposed interpretabilities have been demonstrated to adapt to various CNN-based model structures.

2.1AIFeb 2, 2023

DOR: A Novel Dual-Observation-Based Approach for News Recommendation Systems

Mengyan Wang, Weihua Li, Jingli Shi et al.

Online social media platforms offer access to a vast amount of information, but sifting through the abundance of news can be overwhelming and tiring for readers. personalised recommendation algorithms can help users find information that interests them. However, most existing models rely solely on observations of user behaviour, such as viewing history, ignoring the connections between the news and a user's prior knowledge. This can result in a lack of diverse recommendations for individuals. In this paper, we propose a novel method to address the complex problem of news recommendation. Our approach is based on the idea of dual observation, which involves using a deep neural network with observation mechanisms to identify the main focus of a news article as well as the focus of the user on the article. This is achieved by taking into account the user's belief network, which reflects their personal interests and biases. By considering both the content of the news and the user's perspective, our approach is able to provide more personalised and accurate recommendations. We evaluate the performance of our model on real-world datasets and show that our proposed method outperforms several popular baselines.

4.2CLFeb 19, 2024

Detecting misinformation through Framing Theory: the Frame Element-based Model

Guan Wang, Rebecca Frederick, Jinglong Duan et al.

In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentation or 'framing' of accurate information can dramatically alter its interpretation, potentially leading to misinformation. We highlight this issue through real-world examples, demonstrating how shifts in narrative frames can transmute fact-based information into misinformation. To tackle this challenge, we propose an innovative approach leveraging the power of pre-trained Large Language Models and deep neural networks to detect misinformation originating from accurate facts portrayed under different frames. These advanced AI techniques offer unprecedented capabilities in identifying complex patterns within unstructured data critical for examining the subtleties of narrative frames. The objective of this paper is to bridge a significant research gap in the AI domain, providing valuable insights and methodologies for tackling framing-induced misinformation, thus contributing to the advancement of responsible and trustworthy AI technologies. Several experiments are intensively conducted and experimental results explicitly demonstrate the various impact of elements of framing theory proving the rationale of applying framing theory to increase the performance in misinformation detection.

4.2AINov 13, 2024

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation

Yungang Yi, Weihua Li, Matthew Kuo et al.

AI-based music generation has made significant progress in recent years. However, generating symbolic music that is both long-structured and expressive remains a significant challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. By combining cross-attention and self-attention in a Multi-Scale setting, PerceiverS captures long-range musical structure while preserving performance nuances. The proposed model has been evaluated using the Maestro dataset and has demonstrated improvements in generating coherent and diverse music, characterized by both structural consistency and expressive variation. The project demos and the generated music samples can be accessed through the link: https://perceivers.github.io.

1.2SIJan 12

Ideological Isolation in Online Social Networks: A Survey of Computational Definitions, Metrics, and Mitigation Strategies

Xiaodan Wang, Yanbin Liu, Shiqing Wu et al.

The proliferation of online social networks has significantly reshaped the way individuals access and engage with information. While these platforms offer unprecedented connectivity, they may foster environments where users are increasingly exposed to homogeneous content and like-minded interactions. Such dynamics are associated with selective exposure and the emergence of filter bubbles, echo chambers, tunnel vision, and polarization, which together can contribute to ideological isolation and raise concerns about information diversity and public discourse. This survey provides a comprehensive computational review of existing studies that define, analyze, quantify, and mitigate ideological isolation in online social networks. We examine the mechanisms underlying content personalization, user behavior patterns, and network structures that reinforce content-exposure concentration and narrowing dynamics. This paper also systematically reviews methodological approaches for detecting and measuring these isolation-related phenomena, covering network-, content-, and behavior-based metrics. We further organize computational mitigation strategies, including network-topological interventions and recommendation-level controls, and discuss their trade-offs and deployment considerations. By integrating definitions, metrics, and interventions across structural/topological, content-based, interactional, and cognitive isolation, this survey provides a unified computational framework. It serves as a reference for understanding and addressing the key challenges and opportunities in promoting information diversity and reducing ideological fragmentation in the digital age.

0.9CLMay 26, 2023

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

Guan Wang, Weihua Li, Edmund M-K. Lai et al.

The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences.

3.7CVDec 19, 2021

Parallel Multi-Scale Networks with Deep Supervision for Hand Keypoint Detection

Renjie Li, Son Tran, Saurabh Garg et al.

Keypoint detection plays an important role in a wide range of applications. However, predicting keypoints of small objects such as human hands is a challenging problem. Recent works fuse feature maps of deep Convolutional Neural Networks (CNNs), either via multi-level feature integration or multi-resolution aggregation. Despite achieving some success, the feature fusion approaches increase the complexity and the opacity of CNNs. To address this issue, we propose a novel CNN model named Multi-Scale Deep Supervision Network (P-MSDSNet) that learns feature maps at different scales with deep supervisions to produce attention maps for adaptive feature propagation from layers to layers. P-MSDSNet has a multi-stage architecture which makes it scalable while its deep supervision with spatial attention improves transparency to the feature learning at each stage. We show that P-MSDSNet outperforms the state-of-the-art approaches on benchmark datasets while requiring fewer number of parameters. We also show the application of P-MSDSNet to quantify finger tapping hand movements in a neuroscience study.

1.4CVOct 27, 2021

Hand gesture detection in tests performed by older adults

Guan Huang, Son N. Tran, Quan Bai et al.

Our team are developing a new online test that analyses hand movement features associated with ageing that can be completed remotely from the research centre. To obtain hand movement features, participants will be asked to perform a variety of hand gestures using their own computer cameras. However, it is challenging to collect high quality hand movement video data, especially for older participants, many of whom have no IT background. During the data collection process, one of the key steps is to detect whether the participants are following the test instructions correctly and also to detect similar gestures from different devices. Furthermore, we need this process to be automated and accurate as we expect many thousands of participants to complete the test. We have implemented a hand gesture detector to detect the gestures in the hand movement tests and our detection mAP is 0.782 which is better than the state-of-the-art. In this research, we have processed 20,000 images collected from hand movement tests and labelled 6,450 images to detect different hand gestures in the hand movement tests. This paper has the following three contributions. Firstly, we compared and analysed the performance of different network structures for hand gesture detection. Secondly, we have made many attempts to improve the accuracy of the model and have succeeded in improving the classification accuracy for similar gestures by implementing attention layers. Thirdly, we have created two datasets and included 20 percent of blurred images in the dataset to investigate how different network structures were impacted by noisy data, our experiments have also shown our network has better performance on the noisy dataset.

2.3SIJul 13, 2021

Identifying Influential Users in Unknown Social Networks for Adaptive Incentive Allocation Under Budget Restriction

Shiqing Wu, Weihua Li, Hao Shen et al.

In recent years, recommendation systems have been widely applied in many domains. These systems are impotent in affecting users to choose the behavior that the system expects. Meanwhile, providing incentives has been proven to be a more proactive way to affect users' behaviors. Due to the budget limitation, the number of users who can be incentivized is restricted. In this light, we intend to utilize social influence existing among users to enhance the effect of incentivization. Through incentivizing influential users directly, their followers in the social network are possibly incentivized indirectly. However, in many real-world scenarios, the topological structure of the network is usually unknown, which makes identifying influential users difficult. To tackle the aforementioned challenges, in this paper, we propose a novel algorithm for exploring influential users in unknown networks, which can estimate the influential relationships among users based on their historical behaviors and without knowing the topology of the network. Meanwhile, we design an adaptive incentive allocation approach that determines incentive values based on users' preferences and their influence ability. We evaluate the performance of the proposed approaches by conducting experiments on both synthetic and real-world datasets. The experimental results demonstrate the effectiveness of the proposed approaches.

0.5CLJun 18, 2021

Graph-based Joint Pandemic Concern and Relation Extraction on Twitter

Jingli Shi, Weihua Li, Sira Yongchareon et al.

Public concern detection provides potential guidance to the authorities for crisis management before or during a pandemic outbreak. Detecting people's concerns and attention from online social media platforms has been widely acknowledged as an effective approach to relieve public panic and prevent a social crisis. However, detecting concerns in time from massive information in social media turns out to be a big challenge, especially when sufficient manually labeled data is in the absence of public health emergencies, e.g., COVID-19. In this paper, we propose a novel end-to-end deep learning model to identify people's concerns and the corresponding relations based on Graph Convolutional Network and Bi-directional Long Short Term Memory integrated with Concern Graph. Except for the sequential features from BERT embeddings, the regional features of tweets can be extracted by the Concern Graph module, which not only benefits the concern detection but also enables our model to be high noise-tolerant. Thus, our model can address the issue of insufficient manually labeled data. We conduct extensive experiments to evaluate the proposed model by using both manually labeled tweets and automatically labeled tweets. The experimental results show that our model can outperform the state-of-art models on real-world datasets.

2.4AIApr 29, 2021

Applications of Artificial Intelligence to aid detection of dementia: a narrative review on current capabilities and future directions

Renjie Li, Xinyi Wang, Katherine Lawler et al.

With populations ageing, the number of people with dementia worldwide is expected to triple to 152 million by 2050. Seventy percent of cases are due to Alzheimer's disease (AD) pathology and there is a 10-20 year 'pre-clinical' period before significant cognitive decline occurs. We urgently need, cost effective, objective methods to detect AD, and other dementias, at an early stage. Risk factor modification could prevent 40% of cases and drug trials would have greater chances of success if participants are recruited at an earlier stage. Currently, detection of dementia is largely by pen and paper cognitive tests but these are time consuming and insensitive to pre-clinical phases. Specialist brain scans and body fluid biomarkers can detect the earliest stages of dementia but are too invasive or expensive for widespread use. With the advancement of technology, Artificial Intelligence (AI) shows promising results in assisting with detection of early-stage dementia. Existing AI-aided methods and potential future research directions are reviewed and discussed.

1.2SIApr 14, 2021

ABEM: An Adaptive Agent-based Evolutionary Approach for Mining Influencers in Online Social Networks

Weihua Li, Yuxuan Hu, Shiqing Wu et al.

A key step in influence maximization in online social networks is the identification of a small number of users, known as influencers, who are able to spread influence quickly and widely to other users. The evolving nature of the topological structure of these networks makes it difficult to locate and identify these influencers. In this paper, we propose an adaptive agent-based evolutionary approach to address this problem in the context of both static and dynamic networks. This approach is shown to be able to adapt the solution as the network evolves. It is also applicable to large-scale networks due to its distributed framework. Evaluation of our approach is performed by using both synthetic networks and real-world datasets. Experimental results demonstrate that the proposed approach outperforms state-of-the-art seeding algorithms in terms of maximizing influence.

2.4AIJan 27, 2021

Privacy Information Classification: A Hybrid Approach

Jiaqi Wu, Weihua Li, Quan Bai et al.

A large amount of information has been published to online social networks every day. Individual privacy-related information is also possibly disclosed unconsciously by the end-users. Identifying privacy-related data and protecting the online social network users from privacy leakage turn out to be significant. Under such a motivation, this study aims to propose and develop a hybrid privacy classification approach to detect and classify privacy information from OSNs. The proposed hybrid approach employs both deep learning models and ontology-based models for privacy-related information extraction. Extensive experiments are conducted to validate the proposed hybrid approach, and the empirical results demonstrate its superiority in assisting online social network users against privacy leakage.

4.2CVJun 27, 2020

An Evoked Potential-Guided Deep Learning Brain Representation For Visual Classification

Xianglin Zheng, Zehong Cao, Quan Bai

The new perspective in visual classification aims to decode the feature representation of visual objects from human brain activities. Recording electroencephalogram (EEG) from the brain cortex has been seen as a prevalent approach to understand the cognition process of an image classification task. In this study, we proposed a deep learning framework guided by the visual evoked potentials, called the Event-Related Potential (ERP)-Long short-term memory (LSTM) framework, extracted by EEG signals for visual classification. In specific, we first extracted the ERP sequences from multiple EEG channels to response image stimuli-related information. Then, we trained an LSTM network to learn the feature representation space of visual objects for classification. In the experiment, 10 subjects were recorded by over 50,000 EEG trials from an image dataset with 6 categories, including a total of 72 exemplars. Our results showed that our proposed ERP-LSTM framework could achieve classification accuracies of cross-subject of 66.81% and 27.08% for categories (6 classes) and exemplars (72 classes), respectively. Our results outperformed that of using the existing visual classification frameworks, by improving classification accuracies in the range of 12.62% - 53.99%. Our findings suggested that decoding visual evoked potentials from EEG signals is an effective strategy to learn discriminative brain representations for visual classification.

2.8SEMay 15, 2019

Specifying and Reasoning about Contextual Preferences in the Goal-oriented Requirements Modelling

Khavee Agustus Botangen, Jian Yu, Sira Yongchareon et al.

Goal-oriented requirements variability modelling has established the understanding for adaptability in the early stage of software development-the Requirements Engineering phase. Goal-oriented requirements variability modelling considers both the intentions, which are captured as goals in goal models, and the preferences of different stakeholders as the main sources of system behaviour variability. Most often, however, intentions and preferences vary according to contexts. In this paper, we propose an approach for a contextual preference-based requirements variability analysis in the goal-oriented Requirements Engineering. We introduce a quantitative contextual preference specification to express the varying preferences imposed over requirements that are represented in the goal model. Such contextual preferences are used as criteria to evaluate alternative solutions that satisfy the requirements variability problem. We utilise a state-of-the-art reasoning implementation from the Answer Set Programming domain to automate the derivation and evaluation of solutions that fulfill the goals and satisfy the contextual preferences. Our approach will support systems analysts in their decisions upon alternative design solutions that define subsequent system implementations.