Chuanbo Hu

h-index13

16papers

289citations

Novelty42%

AI Score42

Ranked #61,564 of 194,257 authors (top 32%)#21,184 in CV (top 36%)

16 Papers

2.1CLJul 7, 2023

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media

Chuanbo Hu, Bin Liu, Xin Li et al.

Social media platforms such as Instagram and Twitter have emerged as critical channels for drug marketing and illegal sale. Detecting and labeling online illicit drug trafficking activities becomes important in addressing this issue. However, the effectiveness of conventional supervised learning methods in detecting drug trafficking heavily relies on having access to substantial amounts of labeled data, while data annotation is time-consuming and resource-intensive. Furthermore, these models often face challenges in accurately identifying trafficking activities when drug dealers use deceptive language and euphemisms to avoid detection. To overcome this limitation, we conduct the first systematic study on leveraging large language models (LLMs), such as ChatGPT, to detect illicit drug trafficking activities on social media. We propose an analytical framework to compose \emph{knowledge-informed prompts}, which serve as the interface that humans can interact with and use LLMs to perform the detection task. Additionally, we design a Monte Carlo dropout based prompt optimization method to further to improve performance and interpretability. Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, showing a remarkable improvement of nearly 12\%. By integrating prior knowledge and the proposed prompts, ChatGPT can effectively identify and label drug trafficking activities on social networks, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection. The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.

6.8CVApr 20, 2023

Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis

Mindi Ruan, Xiangxu Yu, Na Zhang et al.

How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision tree-based classification of actions, including human-human interactions (HHI) and human-object interactions (HOI). The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. Under the new framework of contrastive learning, we present the design of an interaction adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various action-related attributes such as periodicity and symmetry. Through the construction of various pretext tasks, we obtain a series of binary classification nodes on the decision tree that can be combined to support higher-level recognition tasks. Experimental justification for the potential of our approach in real-world applications ranges from interaction recognition to symmetry detection. In particular, we have demonstrated the promising performance of video-based autism spectrum disorder (ASD) diagnosis on the CalTech interview video database.

5.7CVMay 14, 2022Code

A Saliency-Guided Street View Image Inpainting Framework for Efficient Last-Meters Wayfinding

Chuanbo Hu, Shan Jia, Fan Zhang et al.

Global Positioning Systems (GPS) have played a crucial role in various navigation applications. Nevertheless, localizing the perfect destination within the last few meters remains an important but unresolved problem. Limited by the GPS positioning accuracy, navigation systems always show users a vicinity of a destination, but not its exact location. Street view images (SVI) in maps as an immersive media technology have served as an aid to provide the physical environment for human last-meters wayfinding. However, due to the large diversity of geographic context and acquisition conditions, the captured SVI always contains various distracting objects (e.g., pedestrians and vehicles), which will distract human visual attention from efficiently finding the destination in the last few meters. To address this problem, we highlight the importance of reducing visual distraction in image-based wayfinding by proposing a saliency-guided image inpainting framework. It aims at redirecting human visual attention from distracting objects to destination-related objects for more efficient and accurate wayfinding in the last meters. Specifically, a context-aware distracting object detection method driven by deep salient object detection has been designed to extract distracting objects from three semantic levels in SVI. Then we employ a large-mask inpainting method with fast Fourier convolutions to remove the detected distracting objects. Experimental results with both qualitative and quantitative analysis show that our saliency-guided inpainting method can not only achieve great perceptual quality in street view images but also redirect the human's visual attention to focus more on static location-related objects than distracting ones. The human-based evaluation also justified the effectiveness of our method in improving the efficiency of locating the target destination.

3.3NCSep 1, 2024

Video-based Analysis Reveals Atypical Social Gaze in People with Autism Spectrum Disorder

Xiangxu Yu, Mindi Ruan, Chuanbo Hu et al.

In this study, we present a quantitative and comprehensive analysis of social gaze in people with autism spectrum disorder (ASD). Diverging from traditional first-person camera perspectives based on eye-tracking technologies, this study utilizes a third-person perspective database from the Autism Diagnostic Observation Schedule, 2nd Edition (ADOS-2) interview videos, encompassing ASD participants and neurotypical individuals as a reference group. Employing computational models, we extracted and processed gaze-related features from the videos of both participants and examiners. The experimental samples were divided into three groups based on the presence of social gaze abnormalities and ASD diagnosis. This study quantitatively analyzed four gaze features: gaze engagement, gaze variance, gaze density map, and gaze diversion frequency. Furthermore, we developed a classifier trained on these features to identify gaze abnormalities in ASD participants. Together, we demonstrated the effectiveness of analyzing social gaze in people with ASD in naturalistic settings, showcasing the potential of third-person video perspectives in enhancing ASD diagnosis through gaze analysis.

5.2CVSep 22, 2024

URSimulator: Human-Perception-Driven Prompt Tuning for Enhanced Virtual Urban Renewal via Diffusion Models

Chuanbo Hu, Shan Jia, Xin Li

Tackling Urban Physical Disorder (e.g., abandoned buildings, litter, messy vegetation, graffiti) is essential, as it negatively impacts the safety, well-being, and psychological state of communities. Urban Renewal is the process of revitalizing these neglected and decayed areas within a city to improve the physical environment and quality of life for residents. Effective urban renewal efforts can transform these environments, enhancing their appeal and livability. However, current research lacks simulation tools that can quantitatively assess and visualize the impacts of renewal efforts, often relying on subjective judgments. Such tools are crucial for planning and implementing effective strategies by providing a clear visualization of potential changes and their impacts. This paper presents a novel framework addressing this gap by using human perception feedback to simulate street environment enhancement. We develop a prompt tuning approach that integrates text-driven Stable Diffusion with human perception feedback, iteratively editing local areas of street view images to better align with perceptions of beauty, liveliness, and safety. Our experiments show that this framework significantly improves perceptions of urban environments, with increases of 17.60% in safety, 31.15% in beauty, and 28.82% in liveliness. In contrast, advanced methods like DiffEdit achieve only 2.31%, 11.87%, and 15.84% improvements, respectively. We applied this framework across various virtual scenarios, including neighborhood improvement, building redevelopment, green space expansion, and community garden creation. The results demonstrate its effectiveness in simulating urban renewal, offering valuable insights for urban planning and policy-making.

6.6CLMay 21

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

Chuanbo Hu, Minglei Yin, Bin Liu et al.

Characteristic linguistic behaviors associated with Social Language Disorder (SLD) in autism spectrum disorder, including echoic repetition, pronoun displacement, and stereotyped media quoting, are largely absent from spontaneous conversation and only emerge under specific conversational conditions. In structured clinical assessments, this latency means that questioning strategy selection is a critical yet underappreciated determinant of how much diagnostic information a conversation yields. Whether large language models (LLMs) can be guided to proactively select questioning strategies that systematically surface these latent traits remains largely unexplored. Here we present TPA (Think, Plan, Ask), a proactive multi-agent dialogue framework applied to the language assessment component of the Autism Diagnostic Observation Schedule Module 4 (ADOS-2), in which a doctor agent explicitly reasons about which traits remain unobserved before selecting a clinically grounded strategy and generating a targeted question. A patient agent grounded in real ADOS-2 clinical data enables reproducible evaluation without real patient participation, validated across three independent experiments confirming adequate fidelity to real patient language. Evaluated on 484 episodes from 35 patients, TPA outperforms six competitive dialogue planning baselines across all primary metrics, achieving 82.1% SLD trait coverage, 16.6% higher than automated replay of real clinical dialogues conducted by trained clinicians (65.5%), with substantially greater per-turn diagnostic efficiency (AUCC: 0.628 vs. 0.458, absolute gain +0.170). These results demonstrate that proactive questioning strategy selection substantially improves the efficiency of automated SLD trait assessment, with direct implications for scalable AI-assisted clinical screening.

31.6AIMar 21, 2024Code

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Shan Jia, Reilin Lyu, Kangran Zhao et al.

DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

6.7SDMay 3, 2024

Exploring Speech Pattern Disorders in Autism using Machine Learning

Chuanbo Hu, Jacob Thrasher, Wenqi Li et al.

Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset of recorded dialogues, we extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs), and balance. These features encompass various aspects of speech such as intonation, volume, rhythm, and speech rate, reflecting the complex nature of communicative behaviors in ASD. We employed machine learning for both classification and regression tasks to analyze these speech features. The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%. Regression models were developed to predict speech pattern related variables and a composite score from all variables, facilitating a deeper understanding of the speech dynamics associated with ASD. The effectiveness of machine learning in interpreting intricate speech patterns and the high classification accuracy underscore the potential of computational methods in supporting the diagnostic processes for ASD. This approach not only aids in early detection but also contributes to personalized treatment planning by providing insights into the speech and communication profiles of individuals with ASD.

1.0CLMay 3, 2024Code

Exploiting ChatGPT for Diagnosing Autism-Associated Language Disorders and Identifying Distinct Features

Chuanbo Hu, Wenqi Li, Mindi Ruan et al.

Diagnosing language disorders associated with autism is a complex challenge, often hampered by the subjective nature and variability of traditional assessment methods. Traditional diagnostic methods not only require intensive human effort but also often result in delayed interventions due to their lack of speed and precision. In this study, we explored the application of ChatGPT, a large language model, to overcome these obstacles by enhancing sensitivity and profiling linguistic features for autism diagnosis. This research utilizes ChatGPT natural language processing capabilities to simplify and improve the diagnostic process, focusing on identifying autism related language patterns. Specifically, we compared ChatGPT performance with that of conventional supervised learning models, including BERT, a model acclaimed for its effectiveness in various natural language processing tasks. We showed that ChatGPT substantially outperformed these models, achieving over 10% improvement in both sensitivity and positive predictive value, in a zero shot learning configuration. The findings underscore the model potential as a diagnostic tool, combining accuracy and applicability. We identified ten key features of autism associated language disorders across scenarios. Features such as echolalia, pronoun reversal, and atypical language usage play a critical role in diagnosing ASD and informing tailored treatment plans. Together, our findings advocate for adopting sophisticated AI tools like ChatGPT in clinical settings to assess and diagnose developmental disorders. Our approach promises enhanced diagnostic precision and supports personalized medicine, potentially transforming the evaluation landscape for autism and similar neurological conditions.

5.0CVMay 4, 2023

UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery

Chuanbo Hu, Shan Jia, Fan Zhang et al.

Urban Physical Disorder (UPD), such as old or abandoned buildings, broken sidewalks, litter, and graffiti, has a negative impact on residents' quality of life. They can also increase crime rates, cause social disorder, and pose a public health risk. Currently, there is a lack of efficient and reliable methods for detecting and understanding UPD. To bridge this gap, we propose UPDExplainer, an interpretable transformer-based framework for UPD detection. We first develop a UPD detection model based on the Swin Transformer architecture, which leverages readily accessible street view images to learn discriminative representations. In order to provide clear and comprehensible evidence and analysis, we subsequently introduce a UPD factor identification and ranking module that combines visual explanation maps with semantic segmentation maps. This novel integrated approach enables us to identify the exact objects within street view images that are responsible for physical disorders and gain insights into the underlying causes. Experimental results on the re-annotated Place Pulse 2.0 dataset demonstrate promising detection performance of the proposed method, with an accuracy of 79.9%. For a comprehensive evaluation of the method's ranking performance, we report the mean Average Precision (mAP), R-Precision (RPrec), and Normalized Discounted Cumulative Gain (NDCG), with success rates of 75.51%, 80.61%, and 82.58%, respectively. We also present a case study of detecting and ranking physical disorders in the southern region of downtown Los Angeles, California, to demonstrate the practicality and effectiveness of our framework.

7.5LGAug 19, 2021

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach

Chuanbo Hu, Minglei Yin, Bin Liu et al.

Social media such as Instagram and Twitter have become important platforms for marketing and selling illicit drugs. Detection of online illicit drug trafficking has become critical to combat the online trade of illicit drugs. However, the legal status often varies spatially and temporally; even for the same drug, federal and state legislation can have different regulations about its legality. Meanwhile, more drug trafficking events are disguised as a novel form of advertising commenting leading to information heterogeneity. Accordingly, accurate detection of illicit drug trafficking events (IDTEs) from social media has become even more challenging. In this work, we conduct the first systematic study on fine-grained detection of IDTEs on Instagram. We propose to take a deep multimodal multilabel learning (DMML) approach to detect IDTEs and demonstrate its effectiveness on a newly constructed dataset called multimodal IDTE(MM-IDTE). Specifically, our model takes text and image data as the input and combines multimodal information to predict multiple labels of illicit drugs. Inspired by the success of BERT, we have developed a self-supervised multimodal bidirectional transformer by jointly fine-tuning pretrained text and image encoders. We have constructed a large-scale dataset MM-IDTE with manually annotated multiple drug labels to support fine-grained detection of illicit drugs. Extensive experimental results on the MM-IDTE dataset show that the proposed DMML methodology can accurately detect IDTEs even in the presence of special characters and style changes attempting to evade detection.

5.5LGAug 18, 2021

Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion

Chuanbo Hu, Minglei Yin, Bin Liu et al.

Illicit drug trafficking via social media sites such as Instagram has become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge due to the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Totally nearly 4,000 user accounts, of which over 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.

1.2CVAug 26, 2020

Detection of Genuine and Posed Facial Expressions of Emotion: A Review

Shan Jia, Shuo Wang, Chuanbo Hu et al.

Facial expressions of emotion play an important role in human social interactions. However, posed acting is not always the same as genuine feeling. Therefore, the credibility assessment of facial expressions, namely, the discrimination of genuine (spontaneous) expressions from posed(deliberate/volitional/deceptive) ones, is a crucial yet challenging task in facial expression understanding. Rapid progress has been made in recent years for automatic detection of genuine and posed facial expressions. This paper presents a general review of the relevant research, including several spontaneous vs. posed (SVP) facial expression databases and various computer vision based detection methods. In addition, a variety of factors that will influence the performance of SVP detection methods are discussed along with open issues and technical challenges.

14.0CVMay 12, 2020

3D Face Anti-spoofing with Factorized Bilinear Coding

Shan Jia, Xin Li, Chuanbo Hu et al.

We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years. When compared with widely studied 2D face presentation attacks, 3D face spoofing attacks are more challenging because face recognition systems are more easily confused by the 3D characteristics of materials similar to real faces. In this work, we tackle the problem of detecting these realistic 3D face presentation attacks, and propose a novel anti-spoofing method from the perspective of fine-grained classification. Our method, based on factorized bilinear coding of multiple color channels (namely MC\_FBC), targets at learning subtle fine-grained differences between real and fake images. By extracting discriminative and fusing complementary information from RGB and YCbCr spaces, we have developed a principled solution to 3D face spoofing detection. A large-scale wax figure face database (WFFD) with both images and videos has also been collected as super-realistic attacks to facilitate the study of 3D face presentation attack detection. Extensive experimental results show that our proposed method achieves the state-of-the-art performance on both our own WFFD and other face spoofing databases under various intra-database and inter-database testing scenarios.

2.6CVOct 12, 2019

Spoofing and Anti-Spoofing with Wax Figure Faces

Shan Jia, Xin Li, Chuanbo Hu et al.

We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years. Compared to widely studied 2D face presentation attacks (e.g. printed photos and video replays), 3D face presentation attacks are more challenging because face recognition systems (FRS) is more easily confused by the 3D characteristics of materials similar to real faces. Existing 3D face spoofing databases, mostly based on 3D facial masks, are restricted to small data size and suffer from poor authenticity due to the difficulty and expense of mask production. In this work, we introduce a wax figure face database (WFFD) as a novel and super-realistic 3D face presentation attack. This database contains 2300 image pairs (totally 4600) and 745 subjects including both real and wax figure faces with high diversity from online collections. On one hand, our experiments have demonstrated the spoofing potential of WFFD on three popular FRSs. On the other hand, we have developed a multi-feature voting scheme for wax figure face detection (anti-spoofing), which combines three discriminative features at the decision level. The proposed detection method was compared against several face PAD approaches and found to outperform other competing methods. Surprisingly, our fusion-based detection method achieves an Average Classification Error Rate (ACER) of 11.73\% on the WFFD database, which is even better than human-based detection.

6.5CVJun 6, 2019

A database for face presentation attack using wax figure faces

Shan Jia, Chuanbo Hu, Guodong Guo et al.

Compared to 2D face presentation attacks (e.g. printed photos and video replays), 3D type attacks are more challenging to face recognition systems (FRS) by presenting 3D characteristics or materials similar to real faces. Existing 3D face spoofing databases, however, mostly based on 3D masks, are restricted to small data size or poor authenticity due to the production difficulty and high cost. In this work, we introduce the first wax figure face database, WFFD, as one type of super-realistic 3D presentation attacks to spoof the FRS. This database consists of 2200 images with both real and wax figure faces (totally 4400 faces) with a high diversity from online collections. Experiments on this database first investigate the vulnerability of three popular FRS to this kind of new attack. Further, we evaluate the performance of several face presentation attack detection methods to show the attack abilities of this super-realistic face spoofing database.