IRJul 12, 2024
Movie Recommendation with Poster Attention via Multi-modal Transformer Feature FusionLinhan Xia, Yicheng Yang, Ziou Chen et al.
Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi-modal movie recommendation system by extract features of the well designed posters for each movie and the narrative text description of the movie. This system uses the BERT model to extract the information of text modality, the ViT model applied to extract the information of poster/image modality, and the Transformer architecture for feature fusion of all modalities to predict users' preference. The integration of pre-trained foundational models with some smaller data sets in downstream applications capture multi-modal content features in a more comprehensive manner, thereby providing more accurate recommendations. The efficiency of the proof-of-concept model is verified by the standard benchmark problem the MovieLens 100K and 1M datasets. The prediction accuracy of user ratings is enhanced in comparison to the baseline algorithm, thereby demonstrating the potential of this cross-modal algorithm to be applied for movie or video recommendation.
LGApr 17, 2025Code
Software Engineering Principles for Fairer Systems: Experiments with GroupCARTKewen Peng, Hao Zhuo, Yicheng Yang et al.
Discrimination-aware classification aims to make accurate predictions while satisfying fairness constraints. Traditional decision tree learners typically optimize for information gain in the target attribute alone, which can result in models that unfairly discriminate against protected social groups (e.g., gender, ethnicity). Motivated by these shortcomings, we propose GroupCART, a tree-based ensemble optimizer that avoids bias during model construction by optimizing not only for decreased entropy in the target attribute but also for increased entropy in protected attributes. Our experiments show that GroupCART achieves fairer models without data transformation and with minimal performance degradation. Furthermore, the method supports customizable weighting, offering a smooth and flexible trade-off between predictive performance and fairness based on user requirements. These results demonstrate that algorithmic bias in decision tree models can be mitigated through multi-task, fairness-aware learning. All code and datasets used in this study are available at: https://github.com/anonymous12138/groupCART.
CVNov 26, 2024
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image InpaintingYicheng Yang, Pengxiang Li, Lu Zhang et al.
Subject-driven image inpainting has recently gained prominence in image editing with the rapid advancement of diffusion models. Beyond image guidance, recent studies have explored incorporating text guidance to achieve identity-preserved yet locally editable object inpainting. However, these methods still suffer from identity overfitting, where original attributes remain entangled with target textual instructions. To overcome this limitation, we propose DreamMix, a diffusion-based framework adept at inserting target objects into user-specified regions while concurrently enabling arbitrary text-driven attribute modifications. DreamMix introduces three key components: (i) an Attribute Decoupling Mechanism (ADM) that synthesizes diverse attribute-augmented image-text pairs to mitigate overfitting; (ii) a Textual Attribute Substitution (TAS) module that isolates target attributes via orthogonal decomposition, and (iii) a Disentangled Inpainting Framework (DIF) that seperates local generation from global harmonization. Extensive experiments across multiple inpainting backbones demonstrate that DreamMix achieves a superior balance between identity preservation and attribute editability across diverse applications, including object insertion, attribute editing, and small object inpainting.
CLOct 13, 2025
Generate Logical Equivalence QuestionsXinyu Wang, Haoming Yu, Yicheng Yang et al.
Academic dishonesty is met with zero tolerance in higher education, yet plagiarism has become increasingly prevalent in the era of online teaching and learning. Automatic Question Generation (AQG) presents a potential solution to mitigate copying by creating unique questions for each student. Additionally, AQG can provide a vast array of practice questions. Our AQG focuses on generating logical equivalence questions for Discrete Mathematics, a foundational course for first-year computer science students. A literature review reveals that existing AQGs for this type of question generate all propositions that meet user-defined constraints, resulting in inefficiencies and a lack of uniform question difficulty. To address this, we propose a new approach that defines logical equivalence questions using a formal language, translates this language into two sets of generation rules, and develops a linear-time algorithm for question generation. We evaluated our AQG through two experiments. The first involved a group of students completing questions generated by our system. Statistical analysis shows that the accuracy of these questions is comparable to that of textbook questions. The second experiment assessed the number of steps required to solve our generated questions, textbook questions, and those generated by multiple large language models. The results indicated that the difficulty of our questions was similar to that of textbook questions, confirming the quality of our AQG.
CLSep 24, 2025
Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized IntersectionsYicheng Yang, Zixian Li, Jean Paul Bizimana et al.
Pedestrian safety is a critical component of urban mobility and is strongly influenced by the interactions between pedestrian decision-making and driver yielding behavior at crosswalks. Modeling driver--pedestrian interactions at intersections requires accurately capturing the complexity of these behaviors. Traditional machine learning models often struggle to capture the nuanced and context-dependent reasoning required for these multifactorial interactions, due to their reliance on fixed feature representations and limited interpretability. In contrast, large language models (LLMs) are suited for extracting patterns from heterogeneous traffic data, enabling accurate modeling of driver-pedestrian interactions. Therefore, this paper leverages multimodal LLMs through a novel prompt design that incorporates domain-specific knowledge, structured reasoning, and few-shot prompting, enabling interpretable and context-aware inference of driver yielding behavior, as an example application of modeling pedestrian--driver interaction. We benchmarked state-of-the-art LLMs against traditional classifiers, finding that GPT-4o consistently achieves the highest accuracy and recall, while Deepseek-V3 excels in precision. These findings highlight the critical trade-offs between model performance and computational efficiency, offering practical guidance for deploying LLMs in real-world pedestrian safety systems.
LGApr 23, 2025
Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score MatchingKewen Peng, Yicheng Yang, Hao Zhuo
Fairness-aware learning aims to mitigate discrimination against specific protected social groups (e.g., those categorized by gender, ethnicity, age) while minimizing predictive performance loss. Despite efforts to improve fairness in machine learning, prior studies have shown that many models remain unfair when measured against various fairness metrics. In this paper, we examine whether the way training and testing data are sampled affects the reliability of reported fairness metrics. Since training and test sets are often randomly sampled from the same population, bias present in the training data may still exist in the test data, potentially skewing fairness assessments. To address this, we propose FairMatch, a post-processing method that applies propensity score matching to evaluate and mitigate bias. FairMatch identifies control and treatment pairs with similar propensity scores in the test set and adjusts decision thresholds for different subgroups accordingly. For samples that cannot be matched, we perform probabilistic calibration using fairness-aware loss functions. Experimental results demonstrate that our approach can (a) precisely locate subsets of the test data where the model is unbiased, and (b) significantly reduce bias on the remaining data. Overall, propensity score matching offers a principled way to improve both fairness evaluation and mitigation, without sacrificing predictive performance.
LGApr 21, 2025
Combating Toxic Language: A Review of LLM-Based Strategies for Software EngineeringHao Zhuo, Yicheng Yang, Kewen Peng
Large Language Models (LLMs) have become integral to software engineering (SE), where they are increasingly used in development workflows. However, their widespread use raises concerns about the presence and propagation of toxic language--harmful or offensive content that can foster exclusionary environments. This paper provides a comprehensive review of recent research on toxicity detection and mitigation, focusing on both SE-specific and general-purpose datasets. We examine annotation and preprocessing techniques, assess detection methodologies, and evaluate mitigation strategies, particularly those leveraging LLMs. Additionally, we conduct an ablation study demonstrating the effectiveness of LLM-based rewriting for reducing toxicity. By synthesizing existing work and identifying open challenges, this review highlights key areas for future research to ensure the responsible deployment of LLMs in SE and beyond.
CLMay 9, 2024
Automatic question generation for propositional logical equivalencesYicheng Yang, Xinyu Wang, Haoming Yu et al.
The increase in academic dishonesty cases among college students has raised concern, particularly due to the shift towards online learning caused by the pandemic. We aim to develop and implement a method capable of generating tailored questions for each student. The use of Automatic Question Generation (AQG) is a possible solution. Previous studies have investigated AQG frameworks in education, which include validity, user-defined difficulty, and personalized problem generation. Our new AQG approach produces logical equivalence problems for Discrete Mathematics, which is a core course for year-one computer science students. This approach utilizes a syntactic grammar and a semantic attribute system through top-down parsing and syntax tree transformations. Our experiments show that the difficulty level of questions generated by our AQG approach is similar to the questions presented to students in the textbook [1]. These results confirm the practicality of our AQG approach for automated question generation in education, with the potential to significantly enhance learning experiences.