LGSep 24, 2024
iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text ClassificationYuanzhe Jin, Adrian Carrasco-Revilla, Min Chen
In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks. In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency. We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy. In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text classification models.
CLAug 4, 2025
AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text ClassificationChenhao Xue, Yuanzhe Jin, Adrian Carrasco-Revilla et al.
When developing text classification models for real world applications, one major challenge is the difficulty to collect sufficient data for all text classes. In this work, we address this challenge by utilizing large language models (LLMs) to generate synthetic data and using such data to improve the performance of the models without waiting for more real data to be collected and labelled. As an LLM generates different synthetic data in response to different input examples, we formulate an automated workflow, which searches for input examples that lead to more ``effective'' synthetic data for improving the model concerned. We study three search strategies with an extensive set of experiments, and use experiment results to inform an ensemble algorithm that selects a search strategy according to the characteristics of a class. Our further experiments demonstrate that this ensemble approach is more effective than each individual strategy in our automated workflow for improving classification models using LLMs.
LGJul 30, 2025
VAR: Visual Analysis for Rashomon Set of Machine Learning Models' PerformanceYuanzhe Jin
Evaluating the performance of closely matched machine learning(ML) models under specific conditions has long been a focus of researchers in the field of machine learning. The Rashomon set is a collection of closely matched ML models, encompassing a wide range of models with similar accuracies but different structures. Traditionally, the analysis of these sets has focused on vertical structural analysis, which involves comparing the corresponding features at various levels within the ML models. However, there has been a lack of effective visualization methods for horizontally comparing multiple models with specific features. We propose the VAR visualization solution. VAR uses visualization to perform comparisons of ML models within the Rashomon set. This solution combines heatmaps and scatter plots to facilitate the comparison. With the help of VAR, ML model developers can identify the optimal model under specific conditions and better understand the Rashomon set's overall characteristics.
ROAug 20, 2021
The Importance of Autonomous Driving Using 5G TechnologyYuanzhe Jin, Neelanshi Varia, Chixiang Wang
The three keys to autonomous driving are sensors, data integration, and 100% safety decisions. In the past, due to the high latency and low reliability of the network, many decisions had to be made locally in the vehicle. This puts high demands on the vehicle itself, which results in the dilatory commercialization of automatic driving. With the advent of 5G, these situations will be greatly improved. In this paper, we present the improvements that 5G technology brings to autonomous vehicles especially in terms of latency and reliability amongst the multitude of other factors. The paper analyzes the specific areas where 5G can improve for autonomous vehicles and Intelligent Transport Systems in general (ITS) and looks forward to the application of 5G technology in the future.
CRMay 12, 2021
Tomen: Application of Bitcoin Transaction Based on TorYuanzhe Jin, Ziheng Dong, Xing Li
Bitcoin has emerged in 2008, and after decades of development, it has become the largest trading currency by far. The core of the blockchain is to ensure the anonymity of user transactions. As more and more analysis algorithms for blockchain transactions appear, the anonymity of the blockchain is increasingly threatened. We propose Tomen, an encryption application for the communication process in the bitcoin transaction process, combined with the encryption principle method of Tor. The goal is to achieve the application of the anonymization of bitcoin transaction communication.