Zhiyu Lin

AI
h-index35
29papers
1,023citations
Novelty47%
AI Score57

29 Papers

AIAug 4, 2023
A Controllable Co-Creative Agent for Game System Design

Rohan Agarwal, Zhiyu Lin, Mark Riedl · gatech

Many advancements have been made in procedural content generation for games, and with mixed-initiative co-creativity, have the potential for great benefits to human designers. However, co-creative systems for game generation are typically limited to specific genres, rules, or games, limiting the creativity of the designer. We seek to model games abstractly enough to apply to any genre, focusing on designing game systems and mechanics, and create a controllable, co-creative agent that can collaborate on these designs. We present a model of games using state-machine-like components and resource flows, a set of controllable metrics, a design evaluator simulating playthroughs with these metrics, and an evolutionary design balancer and generator. We find this system to be both able to express a wide range of games and able to be human-controllable for future co-creative applications.

AIMar 23, 2022
NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty

Jonathan Balloch, Zhiyu Lin, Mustafa Hussain et al. · gatech

A robust body of reinforcement learning techniques have been developed to solve complex sequential decision making problems. However, these methods assume that train and evaluation tasks come from similarly or identically distributed environments. This assumption does not hold in real life where small novel changes to the environment can make a previously learned policy fail or introduce simpler solutions that might never be found. To that end we explore the concept of {\em novelty}, defined in this work as the sudden change to the mechanics or properties of environment. We provide an ontology of for novelties most relevant to sequential decision making, which distinguishes between novelties that affect objects versus actions, unary properties versus non-unary relations, and the distribution of solutions to a task. We introduce NovGrid, a novelty generation framework built on MiniGrid, acting as a toolkit for rapidly developing and evaluating novelty-adaptation-enabled reinforcement learning techniques. Along with the core NovGrid we provide exemplar novelties aligned with our ontology and instantiate them as novelty templates that can be applied to many MiniGrid-compliant environments. Finally, we present a set of metrics built into our framework for the evaluation of novelty-adaptation-enabled machine-learning techniques, and show characteristics of a baseline RL model using these metrics.

AIJan 16, 2023
Neuro-Symbolic World Models for Adapting to Open World Novelty

Jonathan Balloch, Zhiyu Lin, Robert Wright et al. · gatech

Open-world novelty--a sudden change in the mechanics or properties of an environment--is a common occurrence in the real world. Novelty adaptation is an agent's ability to improve its policy performance post-novelty. Most reinforcement learning (RL) methods assume that the world is a closed, fixed process. Consequentially, RL policies adapt inefficiently to novelties. To address this, we introduce WorldCloner, an end-to-end trainable neuro-symbolic world model for rapid novelty adaptation. WorldCloner learns an efficient symbolic representation of the pre-novelty environment transitions, and uses this transition model to detect novelty and efficiently adapt to novelty in a single-shot fashion. Additionally, WorldCloner augments the policy learning process using imagination-based adaptation, where the world model simulates transitions of the post-novelty environment to help the policy adapt. By blending ''imagined'' transitions with interactions in the post-novelty environment, performance can be recovered with fewer total environment interactions. Using environments designed for studying novelty in sequential decision-making problems, we show that the symbolic world model helps its neural policy adapt more efficiently than model-based and model-based neural-only reinforcement learning methods.

AIAug 4, 2022
Creative Wand: A System to Study Effects of Communications in Co-Creative Settings

Zhiyu Lin, Rohan Agarwal, Mark Riedl · gatech

Recent neural generation systems have demonstrated the potential for procedurally generating game content, images, stories, and more. However, most neural generation algorithms are "uncontrolled" in the sense that the user has little say in creative decisions beyond the initial prompt specification. Co-creative, mixed-initiative systems require user-centric means of influencing the algorithm, especially when users are unlikely to have machine learning expertise. The key to co-creative systems is the ability to communicate ideas and intent from the user to the agent, as well as from the agent to the user. Key questions in co-creative AI include: How can users express their creative intentions? How can creative AI systems communicate their beliefs, explain their moves, or instruct users to act on their behalf? When should creative AI systems take initiative? The answer to such questions and more will enable us to develop better co-creative systems that make humans more capable of expressing their creative intents. We introduce CREATIVE-WAND, a customizable framework for investigating co-creative mixed-initiative generation. CREATIVE-WAND enables plug-and-play injection of generative models and human-agent communication channels into a chat-based interface. It provides a number of dimensions along which an AI generator and humans can communicate during the co-creative process. We illustrate the CREATIVE-WAND framework by using it to study one dimension of co-creative communication-global versus local creative intent specification by the user-in the context of storytelling.

HCMar 27
Unlocking Open-Player-Modeling-enhanced Game-Based Learning: The Open Player Socially Analytical Intelligence Architecture

Zhiyu Lin, Boyd Fox, Devon Mckee et al. · gatech

Game-Based Learning (GBL) is a learner-engaging pedagogical methodology, yet adapting games to heterogeneous learners requires transparent, real-time Open Player Models (OPMs). We contribute to the community Open Player Socially Analytical Intelligence (OPSAI), an architecture implementing OPM beyond conceptual frameworks and validated in a GBL application. It decouples gameplay telemetry and analysis from the game engine and automatically derives pedagogically actionable insights, supporting the transparency of computational player models while making them accessible to players. OPSAI comprises three logical layers: a Frontend that both provides the GBL experience and collects information needed for analytics; a stateless Backend that hosts transparent analytics services producing reflective prompts, recommendations, and visualization guides; and a two-tier Log Storage that balances heavy raw gameplay data with lightweight reference indices for low-latency queries. By feeding analytics outputs back into the game interface, OPSAI closes the feedback loop between play and learning, empowering teachers, researchers, and learners alike. We further showcase OPSAI with a full deployment on the Parallel GBL environment, featuring live play traces, peer comparisons, and personalized suggestions, demonstrating a reusable blueprint for future educational games.

CVMay 6, 2022
Investigating and Explaining the Frequency Bias in Image Classification

Zhiyu Lin, Yifei Gao, Jitao Sang

CNNs exhibit many behaviors different from humans, one of which is the capability of employing high-frequency components. This paper discusses the frequency bias phenomenon in image classification tasks: the high-frequency components are actually much less exploited than the low- and mid-frequency components. We first investigate the frequency bias phenomenon by presenting two observations on feature discrimination and learning priority. Furthermore, we hypothesize that (i) the spectral density, (ii) class consistency directly affect the frequency bias. Specifically, our investigations verify that the spectral density of datasets mainly affects the learning priority, while the class consistency mainly affects the feature discrimination.

AIOct 11, 2023
An Ontology of Co-Creative AI Systems

Zhiyu Lin, Mark Riedl · gatech

The term co-creativity has been used to describe a wide variety of human-AI assemblages in which human and AI are both involved in a creative endeavor. In order to assist with disambiguating research efforts, we present an ontology of co-creative systems, focusing on how responsibilities are divided between human and AI system and the information exchanged between them. We extend Lubart's original ontology of creativity support tools with three new categories emphasizing artificial intelligence: computer-as-subcontractor, computer-as-critic, and computer-as-teammate, some of which have sub-categorizations.

CVJun 3, 2023
Towards Black-box Adversarial Example Detection: A Data Reconstruction-based Method

Yifei Gao, Zhiyu Lin, Yunfan Yang et al.

Adversarial example detection is known to be an effective adversarial defense method. Black-box attack, which is a more realistic threat and has led to various black-box adversarial training-based defense methods, however, does not attract considerable attention in adversarial example detection. In this paper, we fill this gap by positioning the problem of black-box adversarial example detection (BAD). Data analysis under the introduced BAD settings demonstrates (1) the incapability of existing detectors in addressing the black-box scenario and (2) the potential of exploring BAD solutions from a data perspective. To tackle the BAD problem, we propose a data reconstruction-based adversarial example detection method. Specifically, we use variational auto-encoder (VAE) to capture both pixel and frequency representations of normal examples. Then we use reconstruction error to detect adversarial examples. Compared with existing detection methods, the proposed method achieves substantially better detection performance in BAD, which helps promote the deployment of adversarial example detection-based defense solutions in real-world models.

HCSep 6, 2024
Beyond Following: Mixing Active Initiative into Computational Creativity

Zhiyu Lin, Upol Ehsan, Rohan Agarwal et al. · gatech

Generative Artificial Intelligence (AI) encounters limitations in efficiency and fairness within the realm of Procedural Content Generation (PCG) when human creators solely drive and bear responsibility for the generative process. Alternative setups, such as Mixed-Initiative Co-Creative (MI-CC) systems, exhibited their promise. Still, the potential of an active mixed initiative, where AI takes a role beyond following, is understudied. This work investigates the influence of the adaptive ability of an active and learning AI agent on creators' expectancy of creative responsibilities in an MI-CC setting. We built and studied a system that employs reinforcement learning (RL) methods to learn the creative responsibility preferences of a human user during online interactions. Situated in story co-creation, we develop a Multi-armed-bandit agent that learns from the human creator, updates its collaborative decision-making belief, and switches between its capabilities during an MI-CC experience. With 39 participants joining a human subject study, Our developed system's learning capabilities are well recognized compared to the non-learning ablation, corresponding to a significant increase in overall satisfaction with the MI-CC experience. These findings indicate a robust association between effective MI-CC collaborative interactions, particularly the implementation of proactive AI initiatives, and deepened understanding among all participants.

CVApr 10Code
Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition

Yuxi Zhou, Zhengbo Zhang, Jingyu Pan et al.

Human action recognition is pivotal in computer vision, with applications ranging from surveillance to human-robot interaction. Despite the effectiveness of supervised skeleton-based methods, their reliance on exhaustive annotation limits generalization to novel actions. Zero-Shot Skeleton Action Recognition (ZSAR) emerges as a promising paradigm, yet it faces challenges due to the spectral bias of diffusion models, which oversmooth high-frequency dynamics. Here, we propose Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), integrating a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and Curriculum-based Semantic Abstraction to address these challenges. Our approach effectively recovers fine-grained motion details, achieving state-of-the-art performance on NTU RGB+D, PKU-MMD, and Kinetics-skeleton datasets. Code has been made available at https://github.com/yuzhi535/FDSM. Project homepage: https://yuzhi535.github.io/FDSM.github.io/

CRDec 7, 2022
Artificial Intelligence Security Competition (AISC)

Yinpeng Dong, Peng Chen, Senyou Deng et al.

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.

CVMay 16
Towards Generalized Image Manipulation Localization via Score-based Model

Yunfei Wang, Bo Du, Zhe Yang et al.

With the rapid evolution of synthetic media, Image Manipulation Localization (IML) has emerged as a critical component in multimedia forensics for ensuring the integrity of digital content. However, generalization remains a core challenge, as existing discriminative methods typically learn a fixed decision boundary that tends to overfit to specific training artifacts and fails to adapt to unseen manipulation types. To address this, we propose DiffIML, a novel framework that introduces score-based generative modeling to IML. Diverging from the direct estimation of hard boundaries, DiffIML approximates the score function, the gradient of the log-likelihood, to capture the intrinsic geometric topology of mask distributions. This paradigm leverages structural priors to iteratively recover coherent masks from noise, thereby circumventing the brittleness associated with discriminative models. Under this formulation, diffusion models serve as an effective numerical solver for the learned score function.To ensure practicality, we respectively resolve the efficiency and stability bottlenecks of standard diffusion by: (1) utilizing a Lightweight Mask-Specific VAE for fast latent-space process and a decoupled architecture with a lightweight denoising UNet, (2) edge supervision and error prior to mitigate error accumulation during sampling. Extensive experiments of two distinct protocols on eight non-generative and three generative benchmarks demonstrate that DiffIML consistently outperforms state-of-the-art methods, yielding remarkable generalization improvements on diverse unseen datasets. The code will be publicly available.

SDOct 23, 2025Code
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Zhiyu Lin, Jingwen Yang, Jiale Zhao et al.

Recent speech-to-speech (S2S) models generate intelligible speech but still lack natural expressiveness, largely due to the absence of a reliable evaluation metric. Existing approaches, such as subjective MOS ratings, low-level acoustic features, and emotion recognition are costly, limited, or incomplete. To address this, we present DeEAR (Decoding the Expressive Preference of eAR), a framework that converts human preference for speech expressiveness into an objective score. Grounded in phonetics and psychology, DeEAR evaluates speech across three dimensions: Emotion, Prosody, and Spontaneity, achieving strong alignment with human perception (Spearman's Rank Correlation Coefficient, SRCC = 0.86) using fewer than 500 annotated samples. Beyond reliable scoring, DeEAR enables fair benchmarking and targeted data curation. It not only distinguishes expressiveness gaps across S2S models but also selects 14K expressive utterances to form ExpressiveSpeech, which improves the expressive score (from 2.0 to 23.4 on a 100-point scale) of S2S models. Demos and codes are available at https://github.com/FreedomIntelligence/ExpressiveSpeech

AIOct 14, 2025Code
ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization

Sunzhu Li, Zhiyu Lin, Shuling Yang et al.

Large Reasoning Models (LRMs) are powerful, but they still suffer from inefficient and off-target reasoning. Currently, training-free methods are limited to either rigid heuristics or descriptive, non-actionable analyses. In this paper, we introduce ThinkPilot, a training-free framework that automatically optimizes LRMs reasoning. It uses an evolutionary process to generate think-prefixes, which are instructions that evolve driven by a taxonomy of reasoning behaviors to guide models toward superior performance. Extensive experiments demonstrate ThinkPilot's broad effectiveness: it significantly improves the accuracy-length trade-off for efficient reasoning, drastically improves safety (for example, cutting the StrongREJECT score of DeepSeek-R1-Distill-Qwen-32B from 27.0% to 0.7), and enhances instruction following. It also synergizes with existing training-based methods. Our analysis reveals that think-prefixes can reliably control LRMs' reasoning behaviors, and that different tasks have strong preferences for specific behavioral distributions. By automatically identifying and eliciting these behaviors, ThinkPilot provides a generalizable framework for aligning LRMs reasoning with task demands. Data and code are available at https://github.com/teqkilla/ThinkPilot

CLMar 23, 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning

Zhiyu Lin, Yifei Gao, Xian Zhao et al.

Language models have recently advanced into the realm of reasoning, yet it is through multimodal reasoning that we can fully unlock the potential to achieve more comprehensive, human-like cognitive capabilities. This survey provides a systematic overview of the recent multimodal reasoning approaches, categorizing them into two levels: language-centric multimodal reasoning and collaborative multimodal reasoning. The former encompasses one-pass visual perception and active visual perception, where vision primarily serves a supporting role in language reasoning. The latter involves action generation and state update within reasoning process, enabling a more dynamic interaction between modalities. Furthermore, we analyze the technical evolution of these methods, discuss their inherent challenges, and introduce key benchmark tasks and evaluation metrics for assessing multimodal reasoning performance. Finally, we provide insights into future research directions from the following two perspectives: (i) from visual-language reasoning to omnimodal reasoning and (ii) from multimodal reasoning to multimodal agents. This survey aims to provide a structured overview that will inspire further advancements in multimodal reasoning research.

CLMar 7, 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

Feng Jiang, Zhiyu Lin, Fan Bu et al.

The rapid development of large language models (LLMs) has brought significant attention to speech models, particularly recent progress in speech2speech protocols supporting speech input and output. However, the existing benchmarks adopt automatic text-based evaluators for evaluating the instruction following ability of these models lack consideration for paralinguistic information in both speech understanding and generation. To address these issues, we introduce S2S-Arena, a novel arena-style S2S benchmark that evaluates instruction-following capabilities with paralinguistic information in both speech-in and speech-out across real-world tasks. We design 154 samples that fused TTS and live recordings in four domains with 21 tasks and manually evaluate existing popular speech models in an arena-style manner. The experimental results show that: (1) in addition to the superior performance of GPT-4o, the speech model of cascaded ASR, LLM, and TTS outperforms the jointly trained model after text-speech alignment in speech2speech protocols; (2) considering paralinguistic information, the knowledgeability of the speech model mainly depends on the LLM backbone, and the multilingual support of that is limited by the speech module; (3) excellent speech models can already understand the paralinguistic information in speech input, but generating appropriate audio with paralinguistic information is still a challenge.

CVMar 13, 2024
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

Yifei Gao, Jiaqi Wang, Zhiyu Lin et al.

The evolution of Artificial Intelligence Generated Contents (AIGCs) is advancing towards higher quality. The growing interactions with AIGCs present a new challenge to the data-driven AI community: While AI-generated contents have played a crucial role in a wide range of AI models, the potential hidden risks they introduce have not been thoroughly examined. Beyond human-oriented forgery detection, AI-generated content poses potential issues for AI models originally designed to process natural data. In this study, we underscore the exacerbated hallucination phenomena in Large Vision-Language Models (LVLMs) caused by AI-synthetic images. Remarkably, our findings shed light on a consistent AIGC \textbf{hallucination bias}: the object hallucinations induced by synthetic images are characterized by a greater quantity and a more uniform position distribution, even these synthetic images do not manifest unrealistic or additional relevant visual features compared to natural images. Moreover, our investigations on Q-former and Linear projector reveal that synthetic images may present token deviations after visual projection, thereby amplifying the hallucination bias.

CLJun 9, 2025
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao et al.

With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application development. Considering that the model requires a confluence of multidimensional sub-capabilities to address the challenges of various development phases, constructing a multi-view evaluation framework is crucial for accurately guiding the enhancement of development efficiency. However, existing benchmarks usually fail to provide an assessment of sub-capabilities and focus solely on webpage generation outcomes. In this work, we draw inspiration from the principles of software engineering and further propose WebUIBench, a benchmark systematically designed to evaluate MLLMs in four key areas: WebUI Perception, HTML Programming,WebUI-HTML Understanding, and WebUI-to-Code. WebUIBench comprises 21K high-quality question-answer pairs derived from over 0.7K real-world websites. The extensive evaluation of 29 mainstream MLLMs uncovers the skill characteristics and various weakness that models encountered during the development process.

AIApr 2
TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns

Zhongbo Wang, Zhiyu Lin, Zhu Wang et al.

Large Language Model-driven (LLM-driven) social bots pose a growing threat to online discourse by generating human-like content that evades conventional detection. Existing methods suffer from limited detection accuracy due to overreliance on single-modality signals, insufficient sensitivity to the specific generative patterns of Artificial Intelligence-Generated Content (AIGC), and a failure to adequately model the interplay between linguistic patterns and behavioral dynamics. To address these limitations, we propose TRACE-Bot, a unified dual-channel framework that jointly models implicit semantic representations and AIGC-enhanced behavioral patterns. TRACE-Bot constructs fine-grained representations from heterogeneous sources, including personal information data, interaction behavior data and tweet data. A dual-channel architecture captures linguistic representations via a pretrained language model and behavioral irregularities via multidimensional activity features augmented with signals from state-of-the-art (SOTA) AIGC detectors. The fused representations are then classified through a lightweight prediction head. Experiments on two public LLM-driven social bot datasets demonstrate SOTA performance, achieving accuracies of 98.46% and 97.50%, respectively. The results further indicate strong robustness against advanced bot strategies, highlighting the effectiveness of jointly leveraging implicit semantic representations and AIGC-enhanced behavioral patterns for emerging LLM-driven social bot detection.

CRJun 9, 2025
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

Haoyang Li, Huan Gao, Zhiyuan Zhao et al.

The widespread adoption of Large Language Models (LLMs) has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs, their specific susceptibility to jailbreak attacks in code generation remains largely unexplored. To fill this gap, we propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation, designed to evaluate LLM robustness against such threats. MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories. Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities: specifically, the average rejection rate for malicious content is 60.93%, dropping to 39.92% when combined with jailbreak attack algorithms. Our work highlights that the code security capabilities of LLMs still pose significant challenges.

CVOct 12, 2024
Debiasing Vison-Language Models with Text-Only Training

Yunfan Yang, Chaoquan Jiang, Zhiyu Lin et al.

Pre-trained vision-language models (VLMs), such as CLIP, have exhibited remarkable performance across various downstream tasks by aligning text and images in a unified embedding space. However, due to the imbalanced distribution of pre-trained datasets, CLIP suffers from the bias problem in real-world applications. Existing debiasing methods struggle to obtain sufficient image samples for minority groups and incur high costs for group labeling. To address the limitations, we propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases. Specifically, this approach repurposes the text encoder to function as an image encoder, thereby eliminating the need for image data. Simultaneously, it utilizes a large language model (LLM) to generate a balanced text dataset, which is then used for prompt tuning. However, we observed that the model overfits to the text modality because label names, serving as supervision signals, appear explicitly in the texts. To address this issue, we further introduce a Multi-Target Prediction (MTP) task that motivates the model to focus on complex contexts and distinguish between target and biased information. Extensive experiments on the Waterbirds and CelebA datasets show that our method significantly improves group robustness, achieving state-of-the-art results among image-free methods and even competitive performance compared to image-supervised methods. Furthermore, the proposed method can be adapted to challenging scenarios with multiple or unknown bias attributes, demonstrating its strong generalization and robustness.

AIOct 6, 2025
What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions

Reza Habibi, Seung Wan Ha, Zhiyu Lin et al.

Meaningful human-AI collaboration requires more than processing language; it demands a deeper understanding of symbols and their socially constructed meanings. While humans naturally interpret symbols through social interaction, AI systems often miss the dynamic interpretations that emerge in conversation. Drawing on Symbolic Interactionism theory, we conducted two studies to investigate how humans and AI co-construct symbols and their meanings. Findings provide empirical insights into how humans and conversational AI agents collaboratively shape meanings during interaction. We show how participants shift their initial definitions of meaning in response to the symbols and interpretations suggested by the conversational AI agents, especially when social context is introduced. We also observe how participants project their personal and social values into these interactions, refining meanings over time. These findings reveal that shared understanding does not emerge from mere agreement but from the bi-directional exchange and reinterpretation of symbols, suggesting new paradigms for human-AI interaction design.

AIJun 28, 2024
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling

Rishav Bhagat, Jonathan Balloch, Zhiyu Lin et al.

Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. In spite of only experiencing the world through their own observations and interactions, people know how to balance focusing on tasks with learning about how changes may affect their understanding of the world. This is possible by choosing to solve tasks in ways that are interesting and generally informative beyond just the current task. Motivated by this, we propose an agent influence framework for RL agents to improve the adaptation efficiency of external models in changing environments without any changes to the agent's rewards. Our formulation is composed of two self-contained modules: interest fields and behavior shaping via interest fields. We implement an uncertainty-based interest field algorithm as well as a skill-sampling-based behavior-shaping algorithm to use in testing this framework. Our results show that our method outperforms the baselines in terms of external model adaptation on metrics that measure both efficiency and performance.

AIMay 3, 2023
Beyond Prompts: Exploring the Design Space of Mixed-Initiative Co-Creativity Systems

Zhiyu Lin, Upol Ehsan, Rohan Agarwal et al.

Generative Artificial Intelligence systems have been developed for image, code, story, and game generation with the goal of facilitating human creativity. Recent work on neural generative systems has emphasized one particular means of interacting with AI systems: the user provides a specification, usually in the form of prompts, and the AI system generates the content. However, there are other configurations of human and AI coordination, such as co-creativity (CC) in which both human and AI systems can contribute to content creation, and mixed-initiative (MI) in which both human and AI systems can initiate content changes. In this paper, we define a hypothetical human-AI configuration design space consisting of different means for humans and AI systems to communicate creative intent to each other. We conduct a human participant study with 185 participants to understand how users want to interact with differently configured MI-CC systems. We find out that MI-CC systems with more extensive coverage of the design space are rated higher or on par on a variety of creative and goal-completion metrics, demonstrating that wider coverage of the design space can improve user experience and achievement when using the system; Preference varies greatly between expertise groups, suggesting the development of adaptive, personalized MI-CC systems; Participants identified new design space dimensions including scrutability -- the ability to poke and prod at models -- and explainability.

AIJul 26, 2021
Benign Adversarial Attack: Tricking Models for Goodness

Jitao Sang, Xian Zhao, Jiaming Zhang et al.

In spite of the successful application in many fields, machine learning models today suffer from notorious problems like vulnerability to adversarial examples. Beyond falling into the cat-and-mouse game between adversarial attack and defense, this paper provides alternative perspective to consider adversarial example and explore whether we can exploit it in benign applications. We first attribute adversarial example to the human-model disparity on employing non-semantic features. While largely ignored in classical machine learning mechanisms, non-semantic feature enjoys three interesting characteristics as (1) exclusive to model, (2) critical to affect inference, and (3) utilizable as features. Inspired by this, we present brave new idea of benign adversarial attack to exploit adversarial examples for goodness in three directions: (1) adversarial Turing test, (2) rejecting malicious model application, and (3) adversarial data augmentation. Each direction is positioned with motivation elaboration, justification analysis and prototype applications to showcase its potential.

CLMar 23, 2021
Plug-and-Blend: A Framework for Controllable Story Generation with Blended Control Codes

Zhiyu Lin, Mark Riedl

Large pre-trained neural language models (LM) have very powerful text generation capabilities. However, in practice, they are hard to control for creative purposes. We describe a Plug-and-Play controllable language generation framework, Plug-and-Blend, that allows a human user to input multiple control codes (topics). In the context of automated story generation, this allows a human user loose or fine-grained control of the topics and transitions between them that will appear in the generated story, and can even allow for overlapping, blended topics. Automated evaluations show our framework, working with different generative LMs, controls the generation towards given continuous-weighted control codes while keeping the generated sentences fluent, demonstrating strong blending capability. A human participant evaluation shows that the generated stories are observably transitioning between two topics.

SDJun 28, 2018
GenerationMania: Learning to Semantically Choreograph

Zhiyu Lin, Kyle Xiao, Mark Riedl

Beatmania is a rhythm action game where players must reproduce some of the sounds of a song by pressing specific controller buttons at the correct time. In this paper we investigate the use of deep neural networks to automatically create game stages - called charts - for arbitrary pieces of music. Our technique uses a multi-layer feed-forward network trained on sound sequence summary statistics to predict which sounds in the music are to be played by the player and which will play automatically. We use another neural network along with rules to determine which controls should be mapped to which sounds. We evaluated our system on the ability to reconstruct charts in a held-out test set, achieving an $F_1$-score that significantly beats LSTM baselines.

AISep 12, 2017
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds

Zhiyu Lin, Brent Harrison, Aaron Keech et al.

We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence and consistency of human feedback. This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment. Managing the trade-off between these three strategies allows DRL agents to be robust to inconsistent or intermittent human feedback. Through experimentation using a synthetic oracle, we show that our technique improves the training speed and overall performance of deep reinforcement learning in navigating three-dimensional environments using Minecraft. We further show that our technique is robust to highly innacurate human feedback and can also operate when no human feedback is given.

CVJul 3, 2016
A Hierarchical Distributed Processing Framework for Big Image Data

Le Dong, Zhiyu Lin, Yan Liang et al.

This paper introduces an effective processing framework nominated ICP (Image Cloud Processing) to powerfully cope with the data explosion in image processing field. While most previous researches focus on optimizing the image processing algorithms to gain higher efficiency, our work dedicates to providing a general framework for those image processing algorithms, which can be implemented in parallel so as to achieve a boost in time efficiency without compromising the results performance along with the increasing image scale. The proposed ICP framework consists of two mechanisms, i.e. SICP (Static ICP) and DICP (Dynamic ICP). Specifically, SICP is aimed at processing the big image data pre-stored in the distributed system, while DICP is proposed for dynamic input. To accomplish SICP, two novel data representations named P-Image and Big-Image are designed to cooperate with MapReduce to achieve more optimized configuration and higher efficiency. DICP is implemented through a parallel processing procedure working with the traditional processing mechanism of the distributed system. Representative results of comprehensive experiments on the challenging ImageNet dataset are selected to validate the capacity of our proposed ICP framework over the traditional state-of-the-art methods, both in time efficiency and quality of results.