CVJan 15Code
V-Zero: Self-Improving Multimodal Reasoning with Zero AnnotationHan Wang, Yi Yang, Jingyuan Hu et al.
Recent advances in multimodal learning have significantly enhanced the reasoning capabilities of vision-language models (VLMs). However, state-of-the-art approaches rely heavily on large-scale human-annotated datasets, which are costly and time-consuming to acquire. To overcome this limitation, we introduce V-Zero, a general post-training framework that facilitates self-improvement using exclusively unlabeled images. V-Zero establishes a co-evolutionary loop by instantiating two distinct roles: a Questioner and a Solver. The Questioner learns to synthesize high-quality, challenging questions by leveraging a dual-track reasoning reward that contrasts intuitive guesses with reasoned results. The Solver is optimized using pseudo-labels derived from majority voting over its own sampled responses. Both roles are trained iteratively via Group Relative Policy Optimization (GRPO), driving a cycle of mutual enhancement. Remarkably, without a single human annotation, V-Zero achieves consistent performance gains on Qwen2.5-VL-7B-Instruct, improving visual mathematical reasoning by +1.7 and general vision-centric by +2.6, demonstrating the potential of self-improvement in multimodal systems. Code is available at https://github.com/SatonoDia/V-Zero
NAMay 19
A Novel Stochastic Particle-Field Algorithm for a Reaction-Diffusion-Advection Cancer Invasion ModelJingyuan Hu, Zhongjian Wang, Jack Xin et al.
In this paper, we present a novel numerical framework for solving a specific biological reaction-diffusion-advection system of cancer growth in three dimensions (3D) using particles of variable mass. We adopt empirical particle measures to represent cell density and dynamically construct the concentration fields of multiple related chemical species throughout the 3D domain. Efficient interaction between the particles and the spatial grid is achieved through a Particle-in-Cell (PIC) algorithm, while diffusion in space is solved rapidly using a spectral method. We demonstrate that for this particular system, the rate of change of particle mass remains bounded over finite time intervals. Furthermore, in addition to the inherent positivity preservation of cell density guaranteed by the empirical particle measures, the concentrations constructed by the algorithm are also unconditionally positivity-preserving on the spatial grid. Moreover, we present a rigorous error analysis for the proposed method, and numerical experiments confirm the theoretical convergence rates. To the best of our knowledge, this is the first numerical work to solve this system in three dimensions, wherein a rapid spread of cells driven by haptotactic flux is observed, similar to the behavior documented in the two-dimensional case.
CRApr 1Code
Do Phone-Use Agents Respect Your Privacy?Zhengyang Tang, Ke Ji, Xidong Wang et al.
We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/tangzhy/MyPhoneBench.
HCFeb 14, 2024
AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous SystemsJiaying Lu, Bo Pan, Jieyi Chen et al.
Recently, Large Language Model based Autonomous system(LLMAS) has gained great popularity for its potential to simulate complicated behaviors of human societies. One of its main challenges is to present and analyze the dynamic events evolution of LLMAS. In this work, we present a visualization approach to explore detailed statuses and agents' behavior within LLMAS. We propose a general pipeline that establishes a behavior structure from raw LLMAS execution events, leverages a behavior summarization algorithm to construct a hierarchical summary of the entire structure in terms of time sequence, and a cause trace method to mine the causal relationship between agent behaviors. We then develop AgentLens, a visual analysis system that leverages a hierarchical temporal visualization for illustrating the evolution of LLMAS, and supports users to interactively investigate details and causes of agents' behaviors. Two usage scenarios and a user study demonstrate the effectiveness and usability of our AgentLens.
MLJun 27, 2025
Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddingsRong Ma, Xi Li, Jingyuan Hu et al.
Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed high-dimensional single-cell data into low dimensions. But they often introduce undesirable distortions, resulting in misleading interpretations. Existing evaluation methods for NE algorithms primarily focus on separating discrete cell types rather than capturing continuous cell-state transitions, while dynamic modeling approaches rely on strong assumptions about cellular processes and specialized data. To address these challenges, we build on the Predictability-Computability-Stability (PCS) framework for reliable and reproducible data-driven discoveries. First, we systematically evaluate popular NE algorithms through empirical analysis, simulation, and theory, and reveal their key shortcomings, such as artifacts and instability. We then introduce NESS, a principled and interpretable machine learning approach to improve NE representations by leveraging algorithmic stability and to enable robust inference of smooth biological structures. NESS offers useful concepts, quantitative stability metrics, and efficient computational workflows to uncover developmental trajectories and cell-state transitions in single-cell data. Finally, we apply NESS to six single-cell datasets, spanning pluripotent stem cell differentiation, organoid development, and multiple tissue-specific lineage trajectories. Across these diverse contexts, NESS consistently yields useful biological insights, such as identification of transitional and stable cell states and quantification of transcriptional dynamics during development.
CLOct 27, 2021
Pay attention to emoji: Feature Fusion Network with EmoGraph2vec Model for Sentiment AnalysisXiaowei Yuan, Jingyuan Hu, Xiaodan Zhang et al.
With the explosive growth of social media, opinionated postings with emojis have increased explosively. Many emojis are used to express emotions, attitudes, and opinions. Emoji representation learning can be helpful to improve the performance of emoji-related natural language processing tasks, especially in text sentiment analysis. However, most studies have only utilized the fixed descriptions provided by the Unicode Consortium without consideration of actual usage scenarios. As for the sentiment analysis task, many researchers ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this work, we propose a method called EmoGraph2vec to learn emoji representations by constructing a co-occurrence graph network from social data and enriching the semantic information based on an external knowledge base EmojiNet to embed emoji nodes. Based on EmoGraph2vec model, we design a novel neural network to incorporate text and emoji information into sentiment analysis, which uses a hybrid-attention module combined with TextCNN-based classifier to improve performance. Experimental results show that the proposed model can outperform several baselines for sentiment analysis on benchmark datasets. Additionally, we conduct a series of ablation and comparison experiments to investigate the effectiveness and interpretability of our model.
CLOct 27, 2021
Emoji-based Co-attention Network for Microblog Sentiment AnalysisXiaowei Yuan, Jingyuan Hu, Xiaodan Zhang et al.
Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized emojis as an independent input to classify text sentiment but they ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this paper, we propose an emoji-based co-attention network that learns the mutual emotional semantics between text and emojis on microblogs. Our model adopts the co-attention mechanism based on bidirectional long short-term memory incorporating the text and emojis, and integrates a squeeze-and-excitation block in a convolutional neural network classifier to increase its sensitivity to emotional semantic features. Experimental results show that the proposed method can significantly outperform several baselines for sentiment analysis on short texts of social media.