Yulin Zhang

CV
h-index32
22papers
154citations
Novelty49%
AI Score54

22 Papers

NAJun 12, 2018
The eigenstructures of real (skew) circulant matrices with some applications

Zhongyun Liu, Siheng Chen, Weijin Xu et al.

The circulant matrices and skew-circulant matrices are two special classes of Toeplitz matrices and play vital roles in the computation of Toeplitz matrices. In this paper, we focus on real circulant and skewcirculant matrices. We first investigate their real Schur forms, which are closely related to the family of discrete cosine transform (DCT) and discrete sine transform (DST). Using those real Schur forms, we then develop some fast algorithms for computing real circulant, skew-circulant and Toeplitz matrix-real vector multiplications. Also, we develop a DCT-DST version of circulant and skew-circulant splitting (CSCS) iteration for real positive definite Toeplitz systems. Compared with the fast Fourier transform (FFT) version of CSCS iteration, the DCTDST version is more efficient and saves a half storage. Numerical experiments are presented to illustrate the effectiveness of our method.

CVJul 14, 2024Code
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

Cheng Shi, Yulin Zhang, Bin Yang et al.

Unsupervised 3D instance segmentation aims to segment objects from a 3D point cloud without any annotations. Existing methods face the challenge of either too loose or too tight clustering, leading to under-segmentation or over-segmentation. To address this issue, we propose Part2Object, hierarchical clustering with object guidance. Part2Object employs multi-layer clustering from points to object parts and objects, allowing objects to manifest at any layer. Additionally, it extracts and utilizes 3D objectness priors from temporally consecutive 2D RGB frames to guide the clustering process. Moreover, we propose Hi-Mask3D to support hierarchical 3D object part and instance segmentation. By training Hi-Mask3D on the objects and object parts extracted from Part2Object, we achieve consistent and superior performance compared to state-of-the-art models in various settings, including unsupervised instance segmentation, data-efficient fine-tuning, and cross-dataset generalization. Code is release at https://github.com/ChengShiest/Part2Object

NAJan 24, 2012
Structure-preserving Schur methods for computing square roots of real skew-Hamiltonian matrices

Zhongyun Liu, Yulin Zhang, Carla Ferreira et al.

Our contribution is two-folded. First, starting from the known fact that every real skew-Hamiltonian matrix has a real Hamiltonian square root, we give a complete characterization of the square roots of a real skew-Hamiltonian matrix W. Second, we propose a structure exploiting method for computing square roots of W. Compared to the standard real Schur method, which ignores the structure, our method requires significantly less arithmetic.

CVFeb 25
WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs

Yulin Zhang, Cheng Shi, Sibei Yang

Recent advances in Multimodal Large Language Models have greatly improved visual understanding and reasoning, yet their quadratic attention and offline training protocols make them ill-suited for streaming settings where frames arrive sequentially and future observations are inaccessible. We diagnose a core limitation of current Video-LLMs, namely Time-Agnosticism, in which videos are treated as an unordered bag of evidence rather than a causally ordered sequence, yielding two failures in streams: temporal order ambiguity, in which the model cannot follow or reason over the correct chronological order, and past-current focus blindness where it fails to distinguish present observations from accumulated history. We present WeaveTime, a simple, efficient, and model agnostic framework that first teaches order and then uses order. We introduce a lightweight Temporal Reconstruction objective-our Streaming Order Perception enhancement-that instills order aware representations with minimal finetuning and no specialized streaming data. At inference, a Past-Current Dynamic Focus Cache performs uncertainty triggered, coarse-to-fine retrieval, expanding history only when needed. Plugged into exsiting Video-LLM without architectural changes, WeaveTime delivers consistent gains on representative streaming benchmarks, improving accuracy while reducing latency. These results establish WeaveTime as a practical path toward time aware stream Video-LLMs under strict online, time causal constraints. Code and weights will be made publicly available. Project Page: https://zhangyl4.github.io/publications/weavetime/

LGJan 30
FedDis: A Causal Disentanglement Framework for Federated Traffic Prediction

Chengyang Zhou, Zijian Zhang, Chunxu Zhang et al.

Federated learning offers a promising paradigm for privacy-preserving traffic prediction, yet its performance is often challenged by the non-identically and independently distributed (non-IID) nature of decentralized traffic data. Existing federated methods frequently struggle with this data heterogeneity, typically entangling globally shared patterns with client-specific local dynamics within a single representation. In this work, we postulate that this heterogeneity stems from the entanglement of two distinct generative sources: client-specific localized dynamics and cross-client global spatial-temporal patterns. Motivated by this perspective, we introduce FedDis, a novel framework that, to the best of our knowledge, is the first to leverage causal disentanglement for federated spatial-temporal prediction. Architecturally, FedDis comprises a dual-branch design wherein a Personalized Bank learns to capture client-specific factors, while a Global Pattern Bank distills common knowledge. This separation enables robust cross-client knowledge transfer while preserving high adaptability to unique local environments. Crucially, a mutual information minimization objective is employed to enforce informational orthogonality between the two branches, thereby ensuring effective disentanglement. Comprehensive experiments conducted on four real-world benchmark datasets demonstrate that FedDis consistently achieves state-of-the-art performance, promising efficiency, and superior expandability.

95.2NAApr 16
A structure-preserving parametric approximation for anisotropic geometric flows via an $α$-surface energy matrix

Weizhu Bao, Yifei Li, Wenjun Ying et al.

We propose a structure-preserving parametric approximation for geometric flows with general anisotropic effects. By introducing a hyperparameter $α$, we construct a unified surface energy matrix $\hat{\boldsymbol{G}}_k^α(θ)$ that encompasses all existing formulations of surface energy matrices, and apply it to anisotropic curvature flow. We prove that $α=-1$ is the unique choice achieving optimal energy stability under the necessary and sufficient condition $3\hatγ(θ)\geq\hatγ(θ-π)$, while all other $α\neq-1$ require strictly stronger conditions. The framework extends naturally to general anisotropic geometric flows through a unified velocity discretization that ensures energy stability. Numerical experiments validate the theoretical optimality of $α=-1$ and demonstrate the effectiveness and robustness.

88.2LGMay 11
ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang, He Cao, Zihao Jiang et al.

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.

AIJul 8, 2025
Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity

Shuai Zhao, Yulin Zhang, Luwei Xiao et al.

Despite the remarkable progress of large language models (LLMs) across various domains, their capacity to predict retinopathy of prematurity (ROP) risk remains largely unexplored. To address this gap, we introduce a novel Chinese benchmark dataset, termed CROP, comprising 993 admission records annotated with low, medium, and high-risk labels. To systematically examine the predictive capabilities and affective biases of LLMs in ROP risk stratification, we propose Affective-ROPTester, an automated evaluation framework incorporating three prompting strategies: Instruction-based, Chain-of-Thought (CoT), and In-Context Learning (ICL). The Instruction scheme assesses LLMs' intrinsic knowledge and associated biases, whereas the CoT and ICL schemes leverage external medical knowledge to enhance predictive accuracy. Crucially, we integrate emotional elements at the prompt level to investigate how different affective framings influence the model's ability to predict ROP and its bias patterns. Empirical results derived from the CROP dataset yield two principal observations. First, LLMs demonstrate limited efficacy in ROP risk prediction when operating solely on intrinsic knowledge, yet exhibit marked performance gains when augmented with structured external inputs. Second, affective biases are evident in the model outputs, with a consistent inclination toward overestimating medium- and high-risk cases. Third, compared to negative emotions, positive emotional framing contributes to mitigating predictive bias in model outputs. These findings highlight the critical role of affect-sensitive prompt engineering in enhancing diagnostic reliability and emphasize the utility of Affective-ROPTester as a framework for evaluating and mitigating affective bias in clinical language modeling systems.

CVOct 16, 2025
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video

Yulin Zhang, Cheng Shi, Yang Wang et al.

Envision an AI capable of functioning in human-like settings, moving beyond mere observation to actively understand, anticipate, and proactively respond to unfolding events. Towards this vision, we focus on the innovative task where, given ego-streaming video input, an assistant proactively answers diverse, evolving questions at the opportune moment, while maintaining synchronized perception and reasoning. This task embodies three key properties: (1) Proactive Coherence, (2) Just-in-Time Responsiveness, and (3) Synchronized Efficiency. To evaluate and address these properties, we first introduce ESTP-Bench (Ego Streaming Proactive Benchmark) alongside the ESTP-F1 metric-a novel framework designed for their rigorous assessment. Secondly, we propose a comprehensive technical pipeline to enable models to tackle this challenging task. This pipeline comprises: (1) a data engine, (2) a multi-stage training strategy, and (3) a proactive dynamic compression technique. Our proposed model effectively addresses these critical properties while outperforming multiple baselines across diverse online and offline benchmarks. Project Page:https://zhangyl4.github.io/publications/eyes-wide-open/

CVAug 31, 2025
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection

Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.

Detection transformers have been applied to human-object interaction (HOI) detection, enhancing the localization and recognition of human-action-object triplets in images. Despite remarkable progress, this study identifies a critical issue-"Toxic Siblings" bias-which hinders the interaction decoder's learning, as numerous similar yet distinct HOI triplets interfere with and even compete against each other both input side and output side to the interaction decoder. This bias arises from high confusion among sibling triplets/categories, where increased similarity paradoxically reduces precision, as one's gain comes at the expense of its toxic sibling's decline. To address this, we propose two novel debiasing learning objectives-"contrastive-then-calibration" and "merge-then-split"-targeting the input and output perspectives, respectively. The former samples sibling-like incorrect HOI triplets and reconstructs them into correct ones, guided by strong positional priors. The latter first learns shared features among sibling categories to distinguish them from other groups, then explicitly refines intra-group differentiation to preserve uniqueness. Experiments show that we significantly outperform both the baseline (+9.18% mAP on HICO-Det) and the state-of-the-art (+3.59% mAP) across various settings.

AIDec 3, 2021
Learning a Robust Multiagent Driving Policy for Traffic Congestion Reduction

Yulin Zhang, William Macke, Jiaxun Cui et al.

In most modern cities, traffic congestion is one of the most salient societal challenges. Past research has shown that inserting a limited number of autonomous vehicles (AVs) within the traffic flow, with driving policies learned specifically for the purpose of reducing congestion, can significantly improve traffic conditions. However, to date these AV policies have generally been evaluated under the same limited conditions under which they were trained. On the other hand, to be considered for practical deployment, they must be robust to a wide variety of traffic conditions. This article establishes for the first time that a multiagent driving policy can be trained in such a way that it generalizes to different traffic flows, AV penetration, and road geometries, including on multi-lane roads. Inspired by our successful results in a high-fidelity microsimulation, this article further contributes a novel extension of the well-known Cell Transmission Model (CTM) that, unlike past CTMs, is suitable for modeling congestion in traffic networks, and is thus suitable for studying congestion-reduction policies such as those considered in this article.

ROJul 15, 2021
On nondeterminism in combinatorial filters

Yulin Zhang, Dylan A. Shell

The problem of combinatorial filter reduction arises from questions of resource optimization in robots; it is one specific way in which automation can help to achieve minimalism, to build better, simpler robots. This paper contributes a new definition of filter minimization that is broader than its antecedents, allowing filters (input, output, or both) to be nondeterministic. This changes the problem considerably. Nondeterministic filters are able to re-use states to obtain, essentially, more 'behavior' per vertex. We show that the gap in size can be significant (larger than polynomial), suggesting such cases will generally be more challenging than deterministic problems. Indeed, this is supported by the core computational complexity result established in this paper: producing nondeterministic minimizers is PSPACE-hard. The hardness separation for minimization which exists between deterministic filter and deterministic automata, thus, does not hold for the nondeterministic case.

ROJun 1, 2021
Lattices of sensors reconsidered when less information is preferred

Yulin Zhang, Dylan A. Shell

To treat sensing limitations (with uncertainty in both conflation of information and noise) we model sensors as covers. This leads to a semilattice organization of abstract sensors that is appropriate even when additional information is problematic (e.g., for tasks involving privacy considerations).

CVApr 19, 2021
Self-Paced Uncertainty Estimation for One-shot Person Re-Identification

Yulin Zhang, Bo Ma, Longyao Liu et al.

The one-shot Person Re-ID scenario faces two kinds of uncertainties when constructing the prediction model from $X$ to $Y$. The first is model uncertainty, which captures the noise of the parameters in DNNs due to a lack of training data. The second is data uncertainty, which can be divided into two sub-types: one is image noise, where severe occlusion and the complex background contain irrelevant information about the identity; the other is label noise, where mislabeled affects visual appearance learning. In this paper, to tackle these issues, we propose a novel Self-Paced Uncertainty Estimation Network (SPUE-Net) for one-shot Person Re-ID. By introducing a self-paced sampling strategy, our method can estimate the pseudo-labels of unlabeled samples iteratively to expand the labeled samples gradually and remove model uncertainty without extra supervision. We divide the pseudo-label samples into two subsets to make the use of training samples more reasonable and effective. In addition, we apply a Co-operative learning method of local uncertainty estimation combined with determinacy estimation to achieve better hidden space feature mining and to improve the precision of selected pseudo-labeled samples, which reduces data uncertainty. Extensive comparative evaluation experiments on video-based and image-based datasets show that SPUE-Net has significant advantages over the state-of-the-art methods.

CVFeb 6, 2021
Two-Step Image Dehazing with Intra-domain and Inter-domain Adaptation

Xin Yi, Bo Ma, Yulin Zhang et al.

Caused by the difference of data distributions, intra-domain gap and inter-domain gap are widely present in image processing tasks. In the field of image dehazing, certain previous works have paid attention to the inter-domain gap between the synthetic domain and the real domain. However, those methods only establish the connection from the source domain to the target domain without taking into account the large distribution shift within the target domain (intra-domain gap). In this work, we propose a Two-Step Dehazing Network (TSDN) with an intra-domain adaptation and a constrained inter-domain adaptation. First, we subdivide the distributions within the synthetic domain into subsets and mine the optimal subset (easy samples) by loss-based supervision. To alleviate the intra-domain gap of the synthetic domain, we propose an intra-domain adaptation to align distributions of other subsets to the optimal subset by adversarial learning. Finally, we conduct the constrained inter-domain adaptation from the real domain to the optimal subset of the synthetic domain, alleviating the domain shift between domains as well as the distribution shift within the real domain. Extensive experimental results demonstrate that our framework performs favorably against the state-of-the-art algorithms both on the synthetic datasets and the real datasets.

CVNov 30, 2020
AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection

Longyao Liu, Bo Ma, Yulin Zhang et al.

Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples, which is challenging and demanding. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component (e.g., RoI head) in the detector, yet few of them take the distinct preferences of two subtasks towards feature embedding into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a general few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. Spontaneously, separate state estimation is achieved by the R-CNN detector. Besides, for the acquisition of enhanced feature representations, we further introduce Adaptive Fusion Mechanism to adaptively perform feature fusion in different subtasks. Extensive experiments on PASCAL VOC and MS COCO in various settings show that, our method achieves new state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.

RONov 6, 2020
Accelerating combinatorial filter reduction through constraints

Yulin Zhang, Hazhar Rahmani, Dylan A. Shell et al.

Reduction of combinatorial filters involves compressing state representations that robots use. Such optimization arises in automating the construction of minimalist robots. But exact combinatorial filter reduction is an NP-complete problem and all current techniques are either inexact or formalized with exponentially many constraints. This paper proposes a new formalization needing only a polynomial number of constraints, and characterizes these constraints in three different forms: nonlinear, linear, and conjunctive normal form. Empirical results show that constraints in conjunctive normal form capture the problem most effectively, leading to a method that outperforms the others. Further examination indicates that a substantial proportion of constraints remain inactive during iterative filter reduction. To leverage this observation, we introduce just-in-time generation of such constraints, which yields improvements in efficiency and has the potential to minimize large filters.

ROMay 22, 2020
Abstractions for computing all robotic sensors that suffice to solve a planning problem

Yulin Zhang, Dylan A. Shell

Whether a robot can perform some specific task depends on several aspects, including the robot's sensors and the plans it possesses. We are interested in search algorithms that treat plans and sensor designs jointly, yielding solutions---i.e., plan and sensor characterization pairs---if and only if they exist. Such algorithms can help roboticists explore the space of sensors to aid in making design trade-offs. Generalizing prior work where sensors are modeled abstractly as sensor maps on p-graphs, the present paper increases the potential sensors which can be sought significantly. But doing so enlarges a problem currently on the outer limits of being considered tractable. Toward taming this complexity, two contributions are made: (1) we show how to represent the search space for this more general problem and describe data structures that enable whole sets of sensors to be summarized via a single special representative; (2) we give a means by which other structure (either task domain knowledge, sensor technology or fabrication constraints) can be incorporated to reduce the sets to be enumerated. These lead to algorithms that we have implemented and which suffice to solve particular problem instances, albeit only of small scale. Nevertheless, the algorithm aids in helping understand what attributes sensors must possess and what information they must provide in order to ensure a robot can achieve its goals despite non-determinism.

DMFeb 15, 2020
Cover Combinatorial Filters and their Minimization Problem (Extended Version)

Yulin Zhang, Dylan A. Shell

Recent research has examined algorithms to minimize robots' resource footprints. The class of combinatorial filters (discrete variants of widely-used probabilistic estimators) has been studied and methods for reducing their space requirements introduced. This paper extends existing combinatorial filters by introducing a natural generalization that we dub cover combinatorial filters. In addressing the new -- but still NP-complete -- problem of minimization of cover filters, this paper shows that multiple concepts previously believed to be true about combinatorial filters (and actually conjectured, claimed, or assumed to be) are in fact false. For instance, minimization does not induce an equivalence relation. We give an exact algorithm for the cover filter minimization problem. Unlike prior work (based on graph coloring) we consider a type of clique-cover problem, involving a new conditional constraint, from which we can find more general relations. In addition to solving the more general problem, the algorithm also corrects flaws present in all prior filter reduction methods. In employing SAT, the algorithm provides a promising basis for future practical development.

ROOct 9, 2018
What does my knowing your plans tell me?

Yulin Zhang, Dylan A. Shell, Jason M. O'Kane

For robots acting in the presence of observers, we examine the information that is divulged if the observer is party to the robot's plan. Privacy constraints are specified as the stipulations on what can be inferred during plan execution. We imagine a case in which the robot's plan is divulged beforehand, so that the observer can use this {\em a priori} information along with the disclosed executions. The divulged plan, which can be represented by a procrustean graph, is shown to undermine privacy precisely to the extent that it can eliminate action-observation sequences that will never appear in the plan. Future work will consider how the divulged plan might be sought as the output of a planning procedure.

ROSep 25, 2018
Finding plans subject to stipulations on what information they divulge

Yulin Zhang, Dylan A. Shell, Jason M. O'Kane

Motivated by applications where privacy is important, we consider planning problems for robots acting in the presence of an observer. We first formulate and then solve planning problems subject to stipulations on the information divulged during plan execution --- the appropriate solution concept being both a plan and an information disclosure policy. We pose this class of problem under a worst-case model within the framework of procrustean graphs, formulating the disclosure policy as a particular type of map on edge labels. We devise algorithms that, given a planning problem supplemented with an information stipulation, can find a plan, associated disclosure policy, or both if some exists. Both the plan and associated disclosure policy may depend subtlety on additional information available to the observer, such as whether the observer knows the robot's plan (e.g., leaked via a side-channel). Our implementation finds a plan and a suitable disclosure policy, jointly, when any such pair exists, albeit for small problem instances.

ROMay 31, 2018
Complete characterization of a class of privacy-preserving tracking problems

Yulin Zhang, Dylan A. Shell

We examine the problem of target tracking whilst simultaneously preserving the target's privacy as epitomized by the robotic panda tracking scenario, which O'Kane introduced at the 2008 Workshop on the Algorithmic Foundations of Robotics in order to elegantly illustrate the utility of ignorance. The present paper reconsiders his formulation and the tracking strategy he proposed, along with its completeness. We explore how the capabilities of the robot and panda affect the feasibility of tracking with a privacy stipulation, uncovering intrinsic limits, no matter the strategy employed. This paper begins with a one-dimensional setting and, putting the trivially infeasible problems aside, analyzes the strategy space as a function of problem parameters. We show that it is not possible to actively track the target as well as protect its privacy for every nontrivial pair of tracking and privacy stipulations. Secondly, feasibility can be sensitive, in several cases, to the information available to the robot initially. Quite naturally in the one-dimensional model, one may quantify sensing power by the number of perceptual (or output) classes available to the robot. The robot's power to achieve privacy-preserving tracking is bounded, converging asymptotically with increasing sensing power. We analyze the entire space of possible tracking problems, characterizing every instance as either achievable, constructively by giving a policy where one exists (some of which depend on the initial information), or proving the instance impossible. Finally, to relate some of the impossibility results in one dimension to their higher-dimensional counterparts, including the planar panda tracking problem studied by O'Kane, we establish a connection between tracking dimensionality and the sensing power of a one-dimensional robot.