32.7ROMar 12Code
MiNI-Q: A Miniature, Wire-Free Quadruped with Unbounded, Independently Actuated Leg JointsDaniel Koh, Suraj Shah, Yufeng Wu et al.
Physical joint limits are common in legged robots and can restrict workspace, constrain gait design, and increase the risk of hardware damage. This paper introduces MiNI-Q^2, a miniature, wire-free quadruped robot with independently actuated, mechanically unbounded 2-DOF leg joints. We present the mechanical design, kinematic analysis, and experimental validation of the proposed robot. The leg mechanism enables both oscillatory gaits and rotary locomotion while allowing the robot to fold to a minimum height of 2.5 cm. Experimentally, MiNI-Q achieves speeds up to 0.46 m/s and demonstrates low-clearance crawling, stair climbing, inverted locomotion, jumping, and backflipping. The wire-free architecture extends our previous Q8bot design, improving assembly reliability at miniature scale. All mechanical and electrical design files are released open source to support reproducibility and further research.
HCFeb 3
"I'm happy even though it's not real": GenAI Photo Editing as a Remembering ExperienceYufeng Wu, Qing Li, Elise van den Hoven et al.
Generative Artificial Intelligence (GenAI) is increasingly integrated into photo applications on personal devices, making editing photographs easier than ever while potentially influencing the memories they represent. This study explores how and why people use GenAI to edit personal photos and how this shapes their remembering experience. We conducted a two-phase qualitative study with 12 participants: a photo editing session using a GenAI tool guided by the Remembering Experience (RX) dimensions, followed by semi-structured interviews where participants reflected on the editing process and results. Findings show that participants prioritised felt memory over factual accuracy. For different photo elements, environments were modified easily, however, editing was deemed unacceptable if it touched upon a person's identity. Editing processes brought positive and negative impacts, and itself also became a remembering experience. We further discuss potential benefits and risks of GenAI editing for remembering purposes and propose design implications for responsible GenAI.
5.7CLApr 16Code
Exploring and Testing Skill-Based Behavioral Profile Annotation: Human Operability and LLM Feasibility under Schema-Guided ExecutionYufeng Wu
Behavioral Profile (BP) annotation is difficult to automate because it requires simultaneous coding across multiple linguistic dimensions. We treat BP annotation as a bundle of annotation skills rather than a single task and evaluate LLM-assisted BP annotation from this perspective. Using 3,134 concordance lines of 30 Chinese metaphorical color-term derivatives and a 14-feature BP schema, we implement a skill-file-driven pipeline in which each feature is externally defined through schema files, decision rules, and examples. Two human annotators completed a two-round schema-only protocol on a 300-instance validation subset, enabling BP skills to be classified as directly operable, recoverable under focused re-annotation, or structurally underspecified. GPT-5.4 and three locally deployable open-source models were then evaluated under the same setup. Results show that BP annotation is highly heterogeneous at the skill level: 5 skills are directly operable, 4 are recoverable after focused re-annotation, and 5 remain structurally underspecified. GPT-5.4 executes the retained skills with substantial reliability (accuracy = 0.678, \k{appa} = 0.665, weighted F1 = 0.695), but this feasibility is selective rather than global. Human and GPT difficulty profiles are strongly aligned at the skill level (r = 0.881), but not at the instance level (r = 0.016) or lexical-item level (r = -0.142), a pattern we describe as shared taxonomy, independent execution. Pairwise agreement further suggests that GPT is better understood as an independent third skill voice than as a direct human substitute. Open-source failures are concentrated in schema-to-skill execution problems. These findings suggest that automatic annotation should be evaluated in terms of skill feasibility rather than task-level automation.
CVJan 5
DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery LocalizationBoyang Zhao, Xin Liao, Jiaxin Chen et al.
The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.
58.7ROMar 18
DexEXO: A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and LearningAlvin Zhu, Mingzhang Zhu, Beom Jun Kim et al.
Scaling dexterous robot learning is constrained by the difficulty of collecting high-quality demonstrations across diverse operators. Existing wearable interfaces often trade comfort and cross-user adaptability for kinematic fidelity, while embodiment mismatch between demonstration and deployment requires visual post-processing before policy training. We present DexEXO, a wearability-first hand exoskeleton that aligns visual appearance, contact geometry, and kinematics at the hardware level. DexEXO features a pose-tolerant thumb mechanism and a slider-based finger interface analytically modeled to support hand lengths from 140~mm to 217~mm, reducing operator-specific fitting and enabling scalable cross-operator data collection. A passive hand visually matches the deployed robot, allowing direct policy training from raw wrist-mounted RGB observations. User studies demonstrate improved comfort and usability compared to prior wearable systems. Using visually aligned observations alone, we train diffusion policies that achieve competitive performance while substantially simplifying the end-to-end pipeline. These results show that prioritizing wearability and hardware-level embodiment alignment reduces both human and algorithmic bottlenecks without sacrificing task performance. Project Page: https://dexexo-research.github.io/
0.7CLMay 8
A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU FrameworkYufeng Wu
Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline for token-level metaphor identification on the PSU Chinese Metaphor Corpus (PSU CMC), the only widely available MIPVU-annotated Chinese corpus. We systematically compare three model families: (i) encoder fine-tuning with Chinese RoBERTa-wwm-ext-large; (ii) MelBERT adapted to Chinese using a newly constructed basic-meaning resource derived from the Modern Chinese Dictionary, 7th edition (MCD7), comprising 74,823 entries with 71.51% PSU CMC vocabulary coverage; and (iii) Qwen3.5-9B fine-tuned with QLoRA as an instruction-tuned generative baseline. Across five fixed seeds, MelBERT MIP-only achieves the strongest performance at 0.7281 +/- 0.0050 test positive F1, marginally above MelBERT Full (0.7270 +/- 0.0069) and clearly above plain RoBERTa (0.7142 +/- 0.0121). The Qwen QLoRA generative configuration trails encoder baselines by approximately 11 F1 points (0.6157 +/- 0.0113). Three findings merit attention: (1) the SPV channel of MelBERT does not contribute reliable positive signal in Chinese, consistent with the dominance of conventional metaphor; (2) the Qwen-encoder gap is concentrated in recall, reflecting the discrete-commitment limitation of generative output; (3) several Qwen task formulations fail due to format design rather than model capacity. We release all split manifests, per-seed outputs, the MCD7 basic-meaning embedding pipeline, and training scripts to serve as a common reference for future Chinese metaphor identification research.
LGOct 31, 2024
Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path PerturbationWenhan Chang, Tianqing Zhu, Ping Xiong et al.
In the rapid advancement of artificial intelligence, privacy protection has become crucial, giving rise to machine unlearning. Machine unlearning is a technique that removes specific data influences from trained models without the need for extensive retraining. However, it faces several key challenges, including accurately implementing unlearning, ensuring privacy protection during the unlearning process, and achieving effective unlearning without significantly compromising model performance. This paper presents a novel approach to machine unlearning by employing Layer-wise Relevance Analysis and Neuronal Path Perturbation. We address three primary challenges: the lack of detailed unlearning principles, privacy guarantees in zero-shot unlearning scenario, and the balance between unlearning effectiveness and model utility. Our method balances machine unlearning performance and model utility by identifying and perturbing highly relevant neurons, thereby achieving effective unlearning. By using data not present in the original training set during the unlearning process, we satisfy the zero-shot unlearning scenario and ensure robust privacy protection. Experimental results demonstrate that our approach effectively removes targeted data from the target unlearning model while maintaining the model's utility, offering a practical solution for privacy-preserving machine learning.
IVApr 2, 2024
Improving and Evaluating Machine Learning Methods for Forensic Shoeprint MatchingDivij Jain, Saatvik Kher, Lena Liang et al.
We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random forest that generates a probabilistic measurement of how likely two prints are to have originated from the same outsole. We assess the generalisability of machine learning methods trained on lab shoeprint scans to more realistic crime scene shoeprint data by evaluating the accuracy of our methods on several shoeprint scenarios: partial prints, prints with varying levels of blurriness, prints with different amounts of wear, and prints from different shoe models. We find that models trained on one type of shoeprint yield extremely high levels of accuracy when tested on shoeprint pairs of the same scenario but fail to generalise to other scenarios. We also discover that models trained on a variety of scenarios predict almost as accurately as models trained on specific scenarios.
CVApr 26, 2025
Sim-to-Real: An Unsupervised Noise Layer for Screen-Camera Watermarking RobustnessYufeng Wu, Xin Liao, Baowei Wang et al.
Unauthorized screen capturing and dissemination pose severe security threats such as data leakage and information theft. Several studies propose robust watermarking methods to track the copyright of Screen-Camera (SC) images, facilitating post-hoc certification against infringement. These techniques typically employ heuristic mathematical modeling or supervised neural network fitting as the noise layer, to enhance watermarking robustness against SC. However, both strategies cannot fundamentally achieve an effective approximation of SC noise. Mathematical simulation suffers from biased approximations due to the incomplete decomposition of the noise and the absence of interdependence among the noise components. Supervised networks require paired data to train the noise-fitting model, and it is difficult for the model to learn all the features of the noise. To address the above issues, we propose Simulation-to-Real (S2R). Specifically, an unsupervised noise layer employs unpaired data to learn the discrepancy between the modeled simulated noise distribution and the real-world SC noise distribution, rather than directly learning the mapping from sharp images to real-world images. Learning this transformation from simulation to reality is inherently simpler, as it primarily involves bridging the gap in noise distributions, instead of the complex task of reconstructing fine-grained image details. Extensive experimental results validate the efficacy of the proposed method, demonstrating superior watermark robustness and generalization compared to state-of-the-art methods.
LGNov 2, 2024
Network Causal Effect Estimation In Graphical Models Of Contagion And Latent ConfoundingYufeng Wu, Rohit Bhattacharya
A key question in many network studies is whether the observed correlations between units are primarily due to contagion or latent confounding. Here, we study this question using a segregated graph (Shpitser, 2015) representation of these mechanisms, and examine how uncertainty about the true underlying mechanism impacts downstream computation of network causal effects, particularly under full interference -- settings where we only have a single realization of a network and each unit may depend on any other unit in the network. Under certain assumptions about asymptotic growth of the network, we derive likelihood ratio tests that can be used to identify whether different sets of variables -- confounders, treatments, and outcomes -- across units exhibit dependence due to contagion or latent confounding. We then propose network causal effect estimation strategies that provide unbiased and consistent estimates if the dependence mechanisms are either known or correctly inferred using our proposed tests. Together, the proposed methods allow network effect estimation in a wider range of full interference scenarios that have not been considered in prior work. We evaluate the effectiveness of our methods with synthetic data and the validity of our assumptions using real-world networks.
LGJan 6, 2024
Exploration of Adolescent Depression Risk Prediction Based on Census Surveys and General Life IssuesQiang Li, Yufeng Wu, Zhan Xu et al.
In contemporary society, the escalating pressures of life and work have propelled psychological disorders to the forefront of modern health concerns, an issue that has been further accentuated by the COVID-19 pandemic. The prevalence of depression among adolescents is steadily increasing, and traditional diagnostic methods, which rely on scales or interviews, prove particularly inadequate for detecting depression in young people. Addressing these challenges, numerous AI-based methods for assisting in the diagnosis of mental health issues have emerged. However, most of these methods center around fundamental issues with scales or use multimodal approaches like facial expression recognition. Diagnosis of depression risk based on everyday habits and behaviors has been limited to small-scale qualitative studies. Our research leverages adolescent census data to predict depression risk, focusing on children's experiences with depression and their daily life situations. We introduced a method for managing severely imbalanced high-dimensional data and an adaptive predictive approach tailored to data structure characteristics. Furthermore, we proposed a cloud-based architecture for automatic online learning and data updates. This study utilized publicly available NSCH youth census data from 2020 to 2022, encompassing nearly 150,000 data entries. We conducted basic data analyses and predictive experiments, demonstrating significant performance improvements over standard machine learning and deep learning algorithms. This affirmed our data processing method's broad applicability in handling imbalanced medical data. Diverging from typical predictive method research, our study presents a comprehensive architectural solution, considering a wider array of user needs.
59.6ROMar 12
WHED: A Wearable Hand Exoskeleton for Natural, High-Quality Demonstration CollectionMingzhang Zhu, Alvin Zhu, Jose Victor S. H. Ramos et al.
Scalable learning of dexterous manipulation remains bottlenecked by the difficulty of collecting natural, high-fidelity human demonstrations of multi-finger hands due to occlusion, complex hand kinematics, and contact-rich interactions. We present WHED, a wearable hand-exoskeleton system designed for in-the-wild demonstration capture, guided by two principles: wearability-first operation for extended use and a pose-tolerant, free-to-move thumb coupling that preserves natural thumb behaviors while maintaining a consistent mapping to the target robot thumb degrees of freedom. WHED integrates a linkage-driven finger interface with passive fit accommodation, a modified passive hand with robust proprioceptive sensing, and an onboard sensing/power module. We also provide an end-to-end data pipeline that synchronizes joint encoders, AR-based end-effector pose, and wrist-mounted visual observations, and supports post-processing for time alignment and replay. We demonstrate feasibility on representative grasping and manipulation sequences spanning precision pinch and full-hand enclosure grasps, and show qualitative consistency between collected demonstrations and replayed executions.
IRAug 4, 2025
Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered ApproachesRaj Mahmud, Yufeng Wu, Abdullah Bin Sawad et al.
Conversational Recommender Systems (CRSs) are receiving growing research attention across domains, yet their user experience (UX) evaluation remains limited. Existing reviews largely overlook empirical UX studies, particularly in adaptive and large language model (LLM)-based CRSs. To address this gap, we conducted a systematic review following PRISMA guidelines, synthesising 23 empirical studies published between 2017 and 2025. We analysed how UX has been conceptualised, measured, and shaped by domain, adaptivity, and LLM. Our findings reveal persistent limitations: post hoc surveys dominate, turn-level affective UX constructs are rarely assessed, and adaptive behaviours are seldom linked to UX outcomes. LLM-based CRSs introduce further challenges, including epistemic opacity and verbosity, yet evaluations infrequently address these issues. We contribute a structured synthesis of UX metrics, a comparative analysis of adaptive and nonadaptive systems, and a forward-looking agenda for LLM-aware UX evaluation. These findings support the development of more transparent, engaging, and user-centred CRS evaluation practices.
LGJan 19, 2024
Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement LearningBairong Deng, Tao Yu, Zhenning Pan et al.
Reinforcement learning is an emerging approaches to facilitate multi-stage sequential decision-making problems. This paper studies a real-time multi-stage stochastic power dispatch considering multivariate uncertainties. Current researches suffer from low generalization and practicality, that is, the learned dispatch policy can only handle a specific dispatch scenario, its performance degrades significantly if actual samples and training samples are inconsistent. To fill these gaps, a novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed. Specifically, a more general contextual Markov decision process (MDP) and scalable graph representation are introduced to achieve a more generalized multi-stage stochastic power dispatch modeling. An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy. After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner. Numerical comparisons with state-of-the-art policies and traditional reinforcement learning verify the optimality, efficiency, adaptability, and scalability of the proposed Meta-GRL.
IVMar 1, 2020
Weak Texture Information Map Guided Image Super-resolution with Deep Residual NetworksBo Fu, Liyan Wang, Yuechu Wu et al.
Single image super-resolution (SISR) is an image processing task which obtains high-resolution (HR) image from a low-resolution (LR) image. Recently, due to the capability in feature extraction, a series of deep learning methods have brought important crucial improvement for SISR. However, we observe that no matter how deeper the networks are designed, they usually do not have good generalization ability, which leads to the fact that almost all of existing SR methods have poor performances on restoration of the weak texture details. To solve these problems, we propose a weak texture information map guided image super-resolution with deep residual networks. It contains three sub-networks, one main network which extracts the main features and fuses weak texture details, another two auxiliary networks extract the weak texture details fallen in the main network. Two part of networks work cooperatively, the auxiliary networks predict and integrates week texture information into the main network, which is conducive to the main network learning more inconspicuous details. Experiments results demonstrate that our method's performs achieve the state-of-the-art quantitatively. Specifically, the image super-resolution results of our method own more weak texture details.
QUANT-PHAug 10, 2018
Learning and Inference on Generative Adversarial Quantum CircuitsJinfeng Zeng, Yufeng Wu, Jin-Guo Liu et al.
Quantum mechanics is inherently probabilistic in light of Born's rule. Using quantum circuits as probabilistic generative models for classical data exploits their superior expressibility and efficient direct sampling ability. However, training of quantum circuits can be more challenging compared to classical neural networks due to lack of efficient differentiable learning algorithm. We devise an adversarial quantum-classical hybrid training scheme via coupling a quantum circuit generator and a classical neural network discriminator together. After training, the quantum circuit generative model can infer missing data with quadratic speed up via amplitude amplification. We numerically simulate the learning and inference of generative adversarial quantum circuit using the prototypical Bars-and-Stripes dataset. Generative adversarial quantum circuits is a fresh approach to machine learning which may enjoy the practically useful quantum advantage on near-term quantum devices.