SEApr 25, 2023
Empirical Evaluation of ChatGPT on Requirements Information Retrieval Under Zero-Shot SettingJianzhang Zhang, Yiyang Chen, Nan Niu et al.
Recently, various illustrative examples have shown the impressive ability of generative large language models (LLMs) to perform NLP related tasks. ChatGPT undoubtedly is the most representative model. We empirically evaluate ChatGPT's performance on requirements information retrieval (IR) tasks to derive insights into designing or developing more effective requirements retrieval methods or tools based on generative LLMs. We design an evaluation framework considering four different combinations of two popular IR tasks and two common artifact types. Under zero-shot setting, evaluation results reveal ChatGPT's promising ability to retrieve requirements relevant information (high recall) and limited ability to retrieve more specific requirements information (low precision). Our evaluation of ChatGPT on requirements IR under zero-shot setting provides preliminary evidence for designing or developing more effective requirements IR methods or tools based on LLMs.
SEMar 30
Enhancing User-Feedback Driven Requirements PrioritizationAurek Chattopadhyay, Nan Niu, Hui Liu et al.
Context: Requirements prioritization is a challenging problem that is aimed to deliver the most suitable subset from a pool of candidate requirements. The problem is NP-hard when formulated as an optimization problem. Feedback from end users can offer valuable support for software evolution, and ReFeed represents a state-of-the-art in automatically inferring a requirement's priority via quantifiable properties of the feedback messages associated with a candidate requirement. Objectives: In this paper, we enhance ReFeed by shifting the focus of prioritization from treating requirements as independent entities toward interconnecting them. Additionally, we explore if interconnecting requirements provides additional value for search-based solutions. Methods: We leverage user feedback from mobile app store to group requirements into topically coherent clusters. Such interconnectedness, in turn, helps to auto-generate additional "requires" relations in candidate requirements. These "requires" pairs are then integrated into a search-based software engineering solution. Results: The experiments on 94 requirements prioritization instances from four real-world software applications show that our enhancement outperforms ReFeed. In addition, we illustrate how incorporating interconnectedness among requirements improves search-based solutions. Conclusion: Our findings show that requirements interconnectedness improves user feedback driven requirements prioritization, helps uncover additional "requires" relations in candidate requirements, and also strengthens search-based release planning.
CVMar 18
Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story GenerationJianzhang Zhang, Yijing Tian, Jiwang Qu et al.
Story visualization requires generating sequential imagery that aligns semantically with evolving narratives while maintaining rigorous consistency in character identity and visual style. However, existing methodologies often struggle with subject inconsistency and identity drift, particularly when depicting complex interactions or extended narrative arcs. To address these challenges, we propose a cohesive two-stage framework designed for robust and consistent story generation. First, we introduce Group-Shared Attention (GSA), a mechanism that fosters intrinsic consistency by enabling lossless cross-sample information flow within attention layers. This allows the model to structurally encode identity correspondence across frames without relying on external encoders. Second, we leverage Direct Preference Optimization (DPO) to align generated outputs with human aesthetic and narrative standards. Unlike conventional methods that rely on conflicting auxiliary losses, our approach simultaneously enhances visual fidelity and identity preservation by learning from holistic preference data. Extensive evaluations on the ViStoryBench benchmark demonstrate that our method establishes a new state-of-the-art, significantly outperforming strong baselines with gains of +10.0 in Character Identity (CIDS) and +18.7 in Style Consistency (CSD), all while preserving high-fidelity generation.
AISep 24, 2025Code
OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language ModelsJianzhang Zhang, Jialong Zhou, Chuang Liu
Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.
CVAug 18, 2025
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative ModelsJianshu Zeng, Yuxuan Liu, Yutong Feng et al.
Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting in the foreground with harmonious blending. During translation, it is essential to preserve the original properties of the foreground, e.g., albedo, and propagate consistent relighting among temporal frames. In this paper, we propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models, receiving flexible textual description for instructing the control of lighting and background. Considering the scarcity of high-qualified paired videos with the same foreground in various lighting conditions, we construct a large-scale dataset with a mixture of realistic and synthetic videos. For the synthetic domain, benefiting from the abundant 3D assets in the community, we leverage advanced 3D rendering engine to curate video pairs in diverse environments. For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos. Powered by the aforementioned dataset, we design a joint training curriculum to effectively unleash the strengths of each domain, i.e., the physical consistency in synthetic videos, and the generalized domain distribution in realistic videos. To implement this, we inject a domain-aware adapter into the model to decouple the learning of relighting and domain appearance distribution. We construct a comprehensive benchmark to evaluate Lumen together with existing methods, from the perspectives of foreground preservation and video consistency assessment. Experimental results demonstrate that Lumen effectively edit the input into cinematic relighted videos with consistent lighting and strict foreground preservation. Our project page: https://lumen-relight.github.io/