CLDec 5, 2025Code
FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal FeedbackXueqing Wu, Zihan Xue, Da Yin et al.
We present FronTalk, a benchmark for front-end code generation that pioneers the study of a unique interaction dynamic: conversational code generation with multi-modal feedback. In front-end development, visual artifacts such as sketches, mockups and annotated creenshots are essential for conveying design intent, yet their role in multi-turn code generation remains largely unexplored. To address this gap, we focus on the front-end development task and curate FronTalk, a collection of 100 multi-turn dialogues derived from real-world websites across diverse domains such as news, finance, and art. Each turn features both a textual instruction and an equivalent visual instruction, each representing the same user intent. To comprehensively evaluate model performance, we propose a novel agent-based evaluation framework leveraging a web agent to simulate users and explore the website, and thus measuring both functional correctness and user experience. Evaluation of 20 models reveals two key challenges that are under-explored systematically in the literature: (1) a significant forgetting issue where models overwrite previously implemented features, resulting in task failures, and (2) a persistent challenge in interpreting visual feedback, especially for open-source vision-language models (VLMs). We propose a strong baseline to tackle the forgetting issue with AceCoder, a method that critiques the implementation of every past instruction using an autonomous web agent. This approach significantly reduces forgetting to nearly zero and improves the performance by up to 9.3% (56.0% to 65.3%). Overall, we aim to provide a solid foundation for future research in front-end development and the general interaction dynamics of multi-turn, multi-modal code generation. Code and data are released at https://github.com/shirley-wu/frontalk
CLMar 22, 2024
Argument-Aware Approach To Event LinkingI-Hung Hsu, Zihan Xue, Nilay Pochh et al.
Event linking connects event mentions in text with relevant nodes in a knowledge base (KB). Prior research in event linking has mainly borrowed methods from entity linking, overlooking the distinct features of events. Compared to the extensively explored entity linking task, events have more complex structures and can be more effectively distinguished by examining their associated arguments. Moreover, the information-rich nature of events leads to the scarcity of event KBs. This emphasizes the need for event linking models to identify and classify event mentions not in the KB as ``out-of-KB,'' an area that has received limited attention. In this work, we tackle these challenges by introducing an argument-aware approach. First, we improve event linking models by augmenting input text with tagged event argument information, facilitating the recognition of key information about event mentions. Subsequently, to help the model handle ``out-of-KB'' scenarios, we synthesize out-of-KB training examples from in-KB instances through controlled manipulation of event arguments. Our experiment across two test datasets showed significant enhancements in both in-KB and out-of-KB scenarios, with a notable 22% improvement in out-of-KB evaluations.
CRAug 5, 2020
Randomized Last-Level Caches Are Still Vulnerable to Cache Side-Channel Attacks! But We Can Fix ItWei Song, Boya Li, Zihan Xue et al.
Cache randomization has recently been revived as a promising defense against conflict-based cache side-channel attacks. As two of the latest implementations, CEASER-S and ScatterCache both claim to thwart conflict-based cache side-channel attacks using randomized skewed caches. Unfortunately, our experiments show that an attacker can easily find a usable eviction set within the chosen remap period of CEASER-S and increasing the number of partitions without dynamic remapping, such as ScatterCache, cannot eliminate the threat. By quantitatively analyzing the access patterns left by various attacks in the LLC, we have newly discovered several problems with the hypotheses and implementations of randomized caches, which are also overlooked by the research on conflict-based cache side-channel attack. However, cache randomization is not a false hope and it is an effective defense that should be widely adopted in future processors. The newly discovered problems are corresponding to flaws associated with the existing implementation of cache randomization and are fixable. Several new defense techniques are proposed in this paper. our experiments show that all the newly discovered vulnerabilities of existing randomized caches are fixed within the current performance budget. We also argue that randomized set-associative caches can be sufficiently strengthened and possess a better chance to be actually adopted in commercial processors than their skewed counterparts as they introduce less overhaul to the existing cache structure.