Chenyu Yan

CL
h-index10
3papers
14citations
Novelty52%
AI Score49

3 Papers

CLDec 11, 2024Code
Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Jiaqi Chen, Xiaoye Zhu, Tianyang Liu et al.

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.

53.9CYApr 28
TransResAI: A Compound AI System for Coastal Transportation Resilience

Qingwen Pu, Kun Xie, Chenyu Yan

Coastal flooding increasingly threatens transportation infrastructure, yet the analytical tools needed for resilience management remain difficult for many non-specialist practitioners to use. This study presents TransResAI, a compound AI system that supports analysis of flood-aware transportation resilience via natural-language interactions. The system integrates a locally deployable Large Language Model (LLM) with modules for task decomposition, secure code generation, geospatial analysis, retrieval-augmented generation, and interactive map rendering. TransResAI links MATSim flood-scenario simulation outputs, OpenStreetMap-derived flood-risk networks, equity-focused demographic indicators, and regional documents in Hampton Roads, Virginia. A structured user study with domain experts demonstrated that TransResAI reduced task completion time by 80-88% relative to conventional GIS workflows, compressing analytical tasks from a mean of 197.1 seconds to 29.7 seconds and visualization tasks from 364.0 seconds to 46.1 seconds, while maintaining mean accuracy of 4.60/5.00 and task completion rates exceeding 94%. These findings demonstrate that compound AI architectures bridge the gap between general-purpose language models and specialized domain knowledge, as well as the quantitative rigor required for infrastructure resilience, providing transportation agencies and communities with faster, more accessible analytical tools for decision-making under growing climate uncertainty.

56.2LGApr 9
From Synthesis to Clinical Assistance: A Strategy-Aware Agent Framework for Autism Intervention based on Real Clinical Dataset

Junhong Lai, Shuzhong Lai, Yanhao Yu et al.

The development of AI-assisted Early Intensive Behavioral Intervention (EIBI) for Autism Spectrum Disorder (ASD) is severely constrained by data scarcity. Furthermore, while Applied Behavior Analysis (ABA) serves as the gold standard for clinical intervention, general-purpose Large Language Models (LLMs) struggle to strictly adhere to its standardized procedures, often resulting in interactions that are linguistically fluent but strategically inconsistent. To address these challenges, we introduce \textsc{ASDAgent}, a strategy-aware framework designed to unify high-fidelity intervention dialogue synthesis and clinical decision support. \textsc{ASDAgent} incorporates two specialized components to solve distinct problems: (i) a \textsc{DoctorAgent} equipped with an Observe-Think-Act-Correct (O-T-A-C) reasoning loop, which resolves the issue of strategy collapse in LLMs by making ABA execution explicit and controllable; and (ii) a \textsc{ChildAgent} that utilizes probabilistic behavior modeling to mitigate data homogeneity, simulating diverse and non-deterministic ASD response patterns. Experiments demonstrate that dialogues generated by \textsc{ASDAgent} closely mirror the strategy distribution of human therapists (KL divergence: 0.083). In real autism intervention, \textsc{ASDAgent} achieves nearly 80\% strategic consistency with human experts. Moreover, we show that synthetic data produced by \textsc{ASDAgent} effectively distills professional clinical knowledge into small language models (SLMs), significantly enhancing their therapeutic capabilities.