Fumeng Yang

CL
h-index42
6papers
23citations
Novelty46%
AI Score51

6 Papers

HCFeb 26
Trace-Aware Workflows for Co-Creating Branded Content with Generative AI

Taehyun Yang, Eunhye Kim, Zhongzheng Xu et al.

Generative AI tools have lowered barriers to producing branded social media images and captions, yet small-business owners (SBOs) still struggle to create on-brand posts without access to professional designers or marketing consultants. Although these tools enable fast image generation from text prompts, aligning outputs with a brand's intended look and feel remains a demanding, iterative task. In this position paper, we explore how SBOs navigate iterative content creation and how AI-assisted systems can support SBOs' content creation workflow. We conducted a preliminary study with 12 SBOs who independently manage their businesses and social media presence, using a questionnaire to collect their branding practices, content workflows, and use of generative AI alongside conventional design tools. We identified three recurring challenges: (1) translating brand "feel" into effective prompts, (2) difficulty revisiting and comparing prior image generations, and (3) difficulty making sense of changes between iterations to steer refinement. Based on these findings, we present a prototype that scaffolds brand articulation, supports feedback-informed exploration, and maintains a traceboard of branching image iterations. Our work illustrates how traces of the iterative process can serve as workflow support that helps SBOs keep track of explorations, make sense of changes, and refine content.

39.7CLApr 26
Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms

Dayeon Ki, Yu Hou, Rachel Rudinger et al.

Neologisms and emerging slang are central to daily conversation, yet challenging for non-native speakers (NNS) to interpret and use appropriately in cross-cultural communication with native speakers (NS). NNS increasingly make use of Artificial Intelligence (AI) tools to learn these words. We study the utility of such tools in mediating an informal communication scenario through a human-subjects study (N=234): NNS participants learn English neologisms with AI support, write messages using the learned word to an NS friend, and judge contextual appropriateness of the neologism in two provided writing samples. Using both NS evaluator-rated communicative competence of NNS-produced writing and NNS' contextual appropriateness judgments, we compare three AI-based support conditions: AI Definition, AI Rewrite into simpler English, AI Explanation of meaning and usage, and Non-AI Dictionary for comparison. We show that AI Explanation yields the largest gains over no support in NS-rated competence, while contextual appropriateness judgments show indifference across support. NNS participants' self-reported perceptions tend to overestimate NS ratings, revealing a mismatch between perceived and actual competence. We further observe a significant gap between NNS- and NS-produced writing, highlighting the limitations of current AI tools and informing design for future tools.

CLJan 20, 2025
Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas

Nishant Balepur, Vishakh Padmakumar, Fumeng Yang et al.

LLMs are aligned to follow input instructions by learning which of two responses users prefer for a prompt. However, such preference data do not convey why users prefer responses that are chosen or rejected, so LLMs trained on these datasets cannot tailor responses to varied user needs. To surface these parameters of personalization, we apply abductive reasoning to preference data, inferring needs and interests of users, i.e., personas, that may prefer either response. We test this idea in two steps: Persona Inference (PI), abductively inferring personas of users who prefer chosen or rejected outputs, and Persona Tailoring (PT), training models to tailor outputs to personas from PI. We show: 1) LLMs infer personas accurately explaining why different users may prefer both chosen or rejected outputs; 2) Training on preference data augmented with PI personas via PT boosts personalization and generalizes to supporting user-written personas; and 3) Rejected response personas form harder personalization evaluations, showing PT better aids users with uncommon preferences versus typical alignment methods. We argue for an abductive view of preferences for personalization, asking not only which response is better but when, why, and for whom.

CLSep 23, 2025
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users

Nishant Balepur, Matthew Shu, Yoo Yeon Sung et al. · allen-ai, oxford

To assist users in complex tasks, LLMs generate plans: step-by-step instructions towards a goal. While alignment methods aim to ensure LLM plans are helpful, they train (RLHF) or evaluate (ChatbotArena) on what users prefer, assuming this reflects what helps them. We test this with Planorama: an interface where 126 users answer 300 multi-step questions with LLM plans. We get 4388 plan executions and 5584 comparisons to measure plan helpfulness (QA success) and user preferences on plans, and recreate the setup in agents and reward models to see if they simulate or prefer what helps users. We expose: 1) user/model preferences and agent success do not accurately predict which plans help users, so common alignment feedback can misalign with helpfulness; 2) this gap is not due to user-specific preferences, as users are similarly successful when using plans they prefer/disprefer; 3) surface-level cues like brevity and question similarity strongly link to preferences, but such biases fail to predict helpfulness. In all, we argue aligning helpful LLMs needs feedback from real user interactions, not just preferences of what looks helpful, so we discuss the plan NLP researchers can execute to solve this problem.

CVJul 28, 2025
Self-Supervised Continuous Colormap Recovery from a 2D Scalar Field Visualization without a Legend

Hongxu Liu, Xinyu Chen, Haoyang Zheng et al.

Recovering a continuous colormap from a single 2D scalar field visualization can be quite challenging, especially in the absence of a corresponding color legend. In this paper, we propose a novel colormap recovery approach that extracts the colormap from a color-encoded 2D scalar field visualization by simultaneously predicting the colormap and underlying data using a decoupling-and-reconstruction strategy. Our approach first separates the input visualization into colormap and data using a decoupling module, then reconstructs the visualization with a differentiable color-mapping module. To guide this process, we design a reconstruction loss between the input and reconstructed visualizations, which serves both as a constraint to ensure strong correlation between colormap and data during training, and as a self-supervised optimizer for fine-tuning the predicted colormap of unseen visualizations during inferencing. To ensure smoothness and correct color ordering in the extracted colormap, we introduce a compact colormap representation using cubic B-spline curves and an associated color order loss. We evaluate our method quantitatively and qualitatively on a synthetic dataset and a collection of real-world visualizations from the VIS30K dataset. Additionally, we demonstrate its utility in two prototype applications -- colormap adjustment and colormap transfer -- and explore its generalization to visualizations with color legends and ones encoded using discrete color palettes.

HCJul 23, 2021
Rethinking the Ranks of Visual Channels

Caitlyn M. McColeman, Fumeng Yang, Steven Franconeri et al.

Data can be visually represented using visual channels like position, length or luminance. An existing ranking of these visual channels is based on how accurately participants could report the ratio between two depicted values. There is an assumption that this ranking should hold for different tasks and for different numbers of marks. However, there is little existing work testing assumption, especially given that visually computing ratios is relatively unimportant in real-world visualizations, compared to seeing, remembering, and comparing trends and motifs, across displays that almost universally depict more than two values. We asked participants to immediately reproduce a set of values from memory. With a Bayesian multilevel modeling approach, we observed how the relevant rank positions of visual channels shift across different numbers of marks (2, 4 or 8) and for bias, precision, and error measures. The ranking did not hold, even for reproductions of only 2 marks, and the new ranking was highly inconsistent for reproductions of different numbers of marks. Other factors besides channel choice far more influence on performance, such as the number of values in the series (e.g. more marks led to larger errors), or the value of each mark (e.g. small values are systematically overestimated). Recall was worse for displays with 8 marks than 4, consistent with established limits on visual memory. These results show that we must move beyond two-value ratio judgments as a baseline for ranking the quality of a visual channel, including testing new tasks (detection of trends or motifs), timescales (immediate computation, or later comparison), and the number of values (from a handful, to thousands).