CLSep 16, 2023
Has Sentiment Returned to the Pre-pandemic Level? A Sentiment Analysis Using U.S. College Subreddit Data from 2019 to 2022Tian Yan, Fang Liu
As impact of COVID-19 pandemic winds down, both individuals and society gradually return to pre-pandemic activities. This study aims to explore how people's emotions have changed from the pre-pandemic during the pandemic to post-emergency period and whether it has returned to pre-pandemic level. We collected Reddit data in 2019 (pre-pandemic), 2020 (peak pandemic), 2021, and 2022 (late stages of pandemic, transitioning period to post-emergency period) from subreddits in 128 universities/colleges in the U.S., and a set of school-level characteristics. We predicted two sets of sentiments from a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) and graph attention network (GAT) that leverages both rich semantic and relational information among posted messages and then applied a logistic stacking method to obtain the final sentiment classification. After obtaining sentiment label for each message, we used a generalized linear mixed-effects model to estimate temporal trend in sentiment from 2019 to 2022 and how school-level factors may affect sentiment. Compared to the year 2019, the odds of negative sentiment in years 2020, 2021, and 2022 are 24%, 4.3%, and 10.3% higher, respectively, which are all statistically significant(adjusted $p$<0.05). Our study findings suggest a partial recovery in the sentiment composition in the post-pandemic-emergency era. The results align with common expectations and provide a detailed quantification of how sentiments have evolved from 2019 to 2022.
AROct 10, 2023Code
Gem5Pred: Predictive Approaches For Gem5 Simulation TimeTian Yan, Xueyang Li, Sifat Ut Taki et al.
Gem5, an open-source, flexible, and cost-effective simulator, is widely recognized and utilized in both academic and industry fields for hardware simulation. However, the typically time-consuming nature of simulating programs on Gem5 underscores the need for a predictive model that can estimate simulation time. As of now, no such dataset or model exists. In response to this gap, this paper makes a novel contribution by introducing a unique dataset specifically created for this purpose. We also conducted analysis of the effects of different instruction types on the simulation time in Gem5. After this, we employ three distinct models leveraging CodeBERT to execute the prediction task based on the developed dataset. Our superior regression model achieves a Mean Absolute Error (MAE) of 0.546, while our top-performing classification model records an Accuracy of 0.696. Our models establish a foundation for future investigations on this topic, serving as benchmarks against which subsequent models can be compared. We hope that our contribution can simulate further research in this field. The dataset we used is available at https://github.com/XueyangLiOSU/Gem5Pred.
CVMar 4
Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation ModelingWenguang Tao, Xiaotian Wang, Tian Yan et al.
Small object detection under complex backgrounds remains a challenging task due to severe feature degradation, weak semantic representation, and inaccurate localization caused by downsampling operations and background interference. Existing detection frameworks are mainly designed for general objects and often fail to explicitly address the unique characteristics of small objects, such as limited structural cues and strong sensitivity to localization errors. In this paper, we propose a multi-level feature enhancement and global relation modeling framework tailored for small object detection. Specifically, a Residual Haar Wavelet Downsampling module is introduced to preserve fine-grained structural details by jointly exploiting spatial-domain convolutional features and frequency-domain representations. To enhance global semantic awareness and suppress background noise, a Global Relation Modeling module is employed to capture long-range dependencies at high-level feature stages. Furthermore, a Cross-Scale Hybrid Attention module is designed to establish sparse and aligned interactions across multi-scale features, enabling effective fusion of high-resolution details and high-level semantic information with reduced computational overhead. Finally, a Center-Assisted Loss is incorporated to stabilize training and improve localization accuracy for small objects. Extensive experiments conducted on the large-scale RGBT-Tiny benchmark demonstrate that the proposed method consistently outperforms existing state-of-the-art detectors under both IoU-based and scale-adaptive evaluation metrics. These results validate the effectiveness and robustness of the proposed framework for small object detection in complex environments.
CVAug 18, 2025
SocialTrack: Multi-Object Tracking in Complex Urban Traffic Scenes Inspired by Social BehaviorWenguang Tao, Xiaotian Wang, Tian Yan et al.
As a key research direction in the field of multi-object tracking (MOT), UAV-based multi-object tracking has significant application value in the analysis and understanding of urban intelligent transportation systems. However, in complex UAV perspectives, challenges such as small target scale variations, occlusions, nonlinear crossing motions, and motion blur severely hinder the stability of multi-object tracking. To address these challenges, this paper proposes a novel multi-object tracking framework, SocialTrack, aimed at enhancing the tracking accuracy and robustness of small targets in complex urban traffic environments. The specialized small-target detector enhances the detection performance by employing a multi-scale feature enhancement mechanism. The Velocity Adaptive Cubature Kalman Filter (VACKF) improves the accuracy of trajectory prediction by incorporating a velocity dynamic modeling mechanism. The Group Motion Compensation Strategy (GMCS) models social group motion priors to provide stable state update references for low-quality tracks, significantly improving the target association accuracy in complex dynamic environments. Furthermore, the Spatio-Temporal Memory Prediction (STMP) leverages historical trajectory information to predict the future state of low-quality tracks, effectively mitigating identity switching issues. Extensive experiments on the UAVDT and MOT17 datasets demonstrate that SocialTrack outperforms existing state-of-the-art (SOTA) methods across several key metrics. Significant improvements in MOTA and IDF1, among other core performance indicators, highlight its superior robustness and adaptability. Additionally, SocialTrack is highly modular and compatible, allowing for seamless integration with existing trackers to further enhance performance.
CLNov 30, 2021
Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit DataTian Yan, Fang Liu
Background: The COVID-19 pandemic has affected our society and human well-being in various ways. In this study, we investigate how the pandemic has influenced people's emotions and psychological states compared to a pre-pandemic period using real-world data from social media. Method: We collected Reddit social media data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with eight universities. We applied the pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn text embedding from the Reddit messages, and leveraged the relational information among posted messages to train a graph attention network (GAT) for sentiment classification. Finally, we applied model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the model-predicted sentiment labels on the collected data, we used a generalized linear mixed-effects model to estimate the effects of pandemic and in-person teaching during the pandemic on sentiment. Results: The results suggest that the odds of negative sentiments in 2020 (pandemic) were 25.7% higher than the odds in 2019 (pre-pandemic) with a $p$-value $<0.001$; and the odds of negative sentiments associated in-person learning were 48.3% higher than with remote learning in 2020 with a $p$-value of 0.029. Conclusions: Our study results are consistent with the findings in the literature on the negative impacts of the pandemic on people's emotions and psychological states. Our study contributes to the growing real-world evidence on the various negative impacts of the pandemic on our society; it also provides a good example of using both ML techniques and statistical modeling and inference to make better use of real-world data.
CRJun 18, 2021
Some Examples of Privacy-preserving Publication and Sharing of COVID-19 Pandemic DataFang Liu, Dong Wang, Tian Yan
A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and interpretation of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns. In this study, we use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply. Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.