Zhiyu Lin

h-index10

7papers

816citations

Novelty46%

AI Score53

Ranked #12,158 of 194,257 authors (top 6%)#495 in AI (top 4%)

7 Papers

6.7CVApr 10Code

Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition

Yuxi Zhou, Zhengbo Zhang, Jingyu Pan et al.

Human action recognition is pivotal in computer vision, with applications ranging from surveillance to human-robot interaction. Despite the effectiveness of supervised skeleton-based methods, their reliance on exhaustive annotation limits generalization to novel actions. Zero-Shot Skeleton Action Recognition (ZSAR) emerges as a promising paradigm, yet it faces challenges due to the spectral bias of diffusion models, which oversmooth high-frequency dynamics. Here, we propose Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), integrating a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and Curriculum-based Semantic Abstraction to address these challenges. Our approach effectively recovers fine-grained motion details, achieving state-of-the-art performance on NTU RGB+D, PKU-MMD, and Kinetics-skeleton datasets. Code has been made available at https://github.com/yuzhi535/FDSM. Project homepage: https://yuzhi535.github.io/FDSM.github.io/

6.9AIApr 2Code

TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns

Zhongbo Wang, Zhiyu Lin, Zhu Wang et al.

Large Language Model-driven (LLM-driven) social bots pose a growing threat to online discourse by generating human-like content that evades conventional detection. Existing methods suffer from limited detection accuracy due to overreliance on single-modality signals, insufficient sensitivity to the specific generative patterns of Artificial Intelligence-Generated Content (AIGC), and a failure to adequately model the interplay between linguistic patterns and behavioral dynamics. To address these limitations, we propose TRACE-Bot, a unified dual-channel framework that jointly models implicit semantic representations and AIGC-enhanced behavioral patterns. TRACE-Bot constructs fine-grained representations from heterogeneous sources, including personal information data, interaction behavior data and tweet data. A dual-channel architecture captures linguistic representations via a pretrained language model and behavioral irregularities via multidimensional activity features augmented with signals from state-of-the-art (SOTA) AIGC detectors. The fused representations are then classified through a lightweight prediction head. Experiments on two public LLM-driven social bot datasets demonstrate SOTA performance, achieving accuracies of 98.46% and 97.50%, respectively. The results further indicate strong robustness against advanced bot strategies, highlighting the effectiveness of jointly leveraging implicit semantic representations and AIGC-enhanced behavioral patterns for emerging LLM-driven social bot detection.

7.1HCMar 27

Unlocking Open-Player-Modeling-enhanced Game-Based Learning: The Open Player Socially Analytical Intelligence Architecture

Zhiyu Lin, Boyd Fox, Devon Mckee et al. · gatech

Game-Based Learning (GBL) is a learner-engaging pedagogical methodology, yet adapting games to heterogeneous learners requires transparent, real-time Open Player Models (OPMs). We contribute to the community Open Player Socially Analytical Intelligence (OPSAI), an architecture implementing OPM beyond conceptual frameworks and validated in a GBL application. It decouples gameplay telemetry and analysis from the game engine and automatically derives pedagogically actionable insights, supporting the transparency of computational player models while making them accessible to players. OPSAI comprises three logical layers: a Frontend that both provides the GBL experience and collects information needed for analytics; a stateless Backend that hosts transparent analytics services producing reflective prompts, recommendations, and visualization guides; and a two-tier Log Storage that balances heavy raw gameplay data with lightweight reference indices for low-latency queries. By feeding analytics outputs back into the game interface, OPSAI closes the feedback loop between play and learning, empowering teachers, researchers, and learners alike. We further showcase OPSAI with a full deployment on the Parallel GBL environment, featuring live play traces, peer comparisons, and personalized suggestions, demonstrating a reusable blueprint for future educational games.

32.0CLMar 23, 2021Code

Plug-and-Blend: A Framework for Controllable Story Generation with Blended Control Codes

Zhiyu Lin, Mark Riedl

Large pre-trained neural language models (LM) have very powerful text generation capabilities. However, in practice, they are hard to control for creative purposes. We describe a Plug-and-Play controllable language generation framework, Plug-and-Blend, that allows a human user to input multiple control codes (topics). In the context of automated story generation, this allows a human user loose or fine-grained control of the topics and transitions between them that will appear in the generated story, and can even allow for overlapping, blended topics. Automated evaluations show our framework, working with different generative LMs, controls the generation towards given continuous-weighted control codes while keeping the generated sentences fluent, demonstrating strong blending capability. A human participant evaluation shows that the generated stories are observably transitioning between two topics.

7.1SDJun 28, 2018Code

GenerationMania: Learning to Semantically Choreograph

Zhiyu Lin, Kyle Xiao, Mark Riedl

Beatmania is a rhythm action game where players must reproduce some of the sounds of a song by pressing specific controller buttons at the correct time. In this paper we investigate the use of deep neural networks to automatically create game stages - called charts - for arbitrary pieces of music. Our technique uses a multi-layer feed-forward network trained on sound sequence summary statistics to predict which sounds in the music are to be played by the player and which will play automatically. We use another neural network along with rules to determine which controls should be mapped to which sounds. We evaluated our system on the ability to reconstruct charts in a held-out test set, achieving an $F_1$-score that significantly beats LSTM baselines.

17.3AISep 12, 2017

Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds

Zhiyu Lin, Brent Harrison, Aaron Keech et al.

We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence and consistency of human feedback. This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment. Managing the trade-off between these three strategies allows DRL agents to be robust to inconsistent or intermittent human feedback. Through experimentation using a synthetic oracle, we show that our technique improves the training speed and overall performance of deep reinforcement learning in navigating three-dimensional environments using Minecraft. We further show that our technique is robust to highly innacurate human feedback and can also operate when no human feedback is given.

2.1CVJul 3, 2016

A Hierarchical Distributed Processing Framework for Big Image Data

Le Dong, Zhiyu Lin, Yan Liang et al.

This paper introduces an effective processing framework nominated ICP (Image Cloud Processing) to powerfully cope with the data explosion in image processing field. While most previous researches focus on optimizing the image processing algorithms to gain higher efficiency, our work dedicates to providing a general framework for those image processing algorithms, which can be implemented in parallel so as to achieve a boost in time efficiency without compromising the results performance along with the increasing image scale. The proposed ICP framework consists of two mechanisms, i.e. SICP (Static ICP) and DICP (Dynamic ICP). Specifically, SICP is aimed at processing the big image data pre-stored in the distributed system, while DICP is proposed for dynamic input. To accomplish SICP, two novel data representations named P-Image and Big-Image are designed to cooperate with MapReduce to achieve more optimized configuration and higher efficiency. DICP is implemented through a parallel processing procedure working with the traditional processing mechanism of the distributed system. Representative results of comprehensive experiments on the challenging ImageNet dataset are selected to validate the capacity of our proposed ICP framework over the traditional state-of-the-art methods, both in time efficiency and quality of results.