CVMay 31
COLLAR: Cascaded Object-Level Latent Refinement for High-Fidelity Conditional GenerationXinlong Zhang, Jia Wei, Xiaoyu Zhang et al.
Achieving high-fidelity object-level control in Diffusion Transformers remains a significant challenge despite the introduction of structural priors like depth and Canny maps. Current object-level conditional generation methods frequently suffer from visual artifacts and struggle to maintain precise control over objects within small localized regions. To address these limitations, we propose Cascaded Object-Level Latent Refinement (COLLAR), a training-free framework that progressively optimizes object-level features via the Field-of-View (FoV) expansion. First, we propose the Cross-Scale Semantic Alignment (CSSA) module to address spatial-semantic gaps by injecting object-level features into extended-FoV branches via attention mechanisms. To further optimize these features, the Cyclic Feature Injection (CFI) module introduces a reciprocal background feedback mechanism. It leverages a frequency-based adaptive strategy to selectively update the global backbone with context-aligned local information. Finally, the extended-FoV branch serves as a hub for feature optimization, ensuring that object-level features are integrated into the global generation process without compromising final image quality. Extensive experiments on the COCO-MIG and COCO-POS benchmarks demonstrate that our approach consistently outperforms state-of-the-art methods across semantic alignment, image quality, and spatial fidelity.
CVJul 10, 2024Code
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest TransformerTongkun Guan, Chengyu Lin, Wei Shen et al.
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios, such as digitized education and automated offices. Recently, sequence-based models with encoder-decoder architectures have been commonly adopted to address this task by directly predicting LaTeX sequences of expression images. However, these methods only implicitly learn the syntax rules provided by LaTeX, which may fail to describe the position and hierarchical relationship between symbols due to complex structural relations and diverse handwriting styles. To overcome this challenge, we propose a position forest transformer (PosFormer) for HMER, which jointly optimizes two tasks: expression recognition and position recognition, to explicitly enable position-aware symbol feature representation learning. Specifically, we first design a position forest that models the mathematical expression as a forest structure and parses the relative position relationships between symbols. Without requiring extra annotations, each symbol is assigned a position identifier in the forest to denote its relative spatial position. Second, we propose an implicit attention correction module to accurately capture attention for HMER in the sequence-based decoder architecture. Extensive experiments validate the superiority of PosFormer, which consistently outperforms the state-of-the-art methods 2.03%/1.22%/2.00%, 1.83%, and 4.62% gains on the single-line CROHME 2014/2016/2019, multi-line M2E, and complex MNE datasets, respectively, with no additional latency or computational cost. Code is available at https://github.com/SJTU-DeepVisionLab/PosFormer.
CCMar 23, 2022
New Distinguishers for Negation-Limited Weak Pseudorandom FunctionsZhihuai Chen, Siyao Guo, Qian Li et al.
We show how to distinguish circuits with $\log k$ negations (a.k.a $k$-monotone functions) from uniformly random functions in $\exp\left(\tilde{O}\left(n^{1/3}k^{2/3}\right)\right)$ time using random samples. The previous best distinguisher, due to the learning algorithm by Blais, Cannone, Oliveira, Servedio, and Tan (RANDOM'15), requires $\exp\big(\tilde{O}(n^{1/2} k)\big)$ time. Our distinguishers are based on Fourier analysis on \emph{slices of the Boolean cube}. We show that some "middle" slices of negation-limited circuits have strong low-degree Fourier concentration and then we apply a variation of the classic Linial, Mansour, and Nisan "Low-Degree algorithm" (JACM'93) on slices. Our techniques also lead to a slightly improved weak learner for negation limited circuits under the uniform distribution.
CVNov 10, 2025
Predict and Resist: Long-Term Accident Anticipation under Sensor NoiseXingcheng Liu, Bin Rao, Yanchen Guan et al.
Accident anticipation is essential for proactive and safe autonomous driving, where even a brief advance warning can enable critical evasive actions. However, two key challenges hinder real-world deployment: (1) noisy or degraded sensory inputs from weather, motion blur, or hardware limitations, and (2) the need to issue timely yet reliable predictions that balance early alerts with false-alarm suppression. We propose a unified framework that integrates diffusion-based denoising with a time-aware actor-critic model to address these challenges. The diffusion module reconstructs noise-resilient image and object features through iterative refinement, preserving critical motion and interaction cues under sensor degradation. In parallel, the actor-critic architecture leverages long-horizon temporal reasoning and time-weighted rewards to determine the optimal moment to raise an alert, aligning early detection with reliability. Experiments on three benchmark datasets (DAD, CCD, A3D) demonstrate state-of-the-art accuracy and significant gains in mean time-to-accident, while maintaining robust performance under Gaussian and impulse noise. Qualitative analyses further show that our model produces earlier, more stable, and human-aligned predictions in both routine and highly complex traffic scenarios, highlighting its potential for real-world, safety-critical deployment.
ROJul 29, 2021
Mapping Human Muscle Force to Supernumerary Robotics Device for Overhead Task AssistanceJianwen Luo, Sicong Liu, Chengyu Lin et al.
Supernumerary Robotics Device (SRD) is an ideal solution to provide robotic assistance in overhead manual manipulation. Since two arms are occupied for the overhead task, it is desired to have additional arms to assist us in achieving other subtasks such as supporting the far end of a long plate and pushing it upward to fit in the ceiling. In this study, a method that maps human muscle force to SRD for overhead task assistance is proposed. Our methodology is to utilize redundant DoFs such as the idle muscles in the leg to control the supporting force of the SRD. A sEMG device is worn on the operator's shank where muscle signals are measured, parsed, and transmitted to SRD for control. In the control aspect, we adopted stiffness control in the task space based on torque control at the joint level. We are motivated by the fact that humans can achieve daily manipulation merely through simple inherent compliance property in joint driven by muscles. We explore to estimate the force of some particular muscles in humans and control the SRD to imitate the behaviors of muscle and output supporting forces to accomplish the subtasks such as overhead supporting. The sEMG signals detected from human muscles are extracted, filtered, rectified, and parsed to estimate the muscle force. We use this force information as the intent of the operator for proper overhead supporting force. As one of the well-known compliance control methods, stiffness control is easy to achieve using a few of straightforward parameters such as stiffness and equilibrium point. Through tuning the stiffness and equilibrium point, the supporting force of SRD in task space can be easily controlled. The muscle force estimated by sEMG is mapped to the desired force in the task space of the SRD. The desired force is transferred into stiffness or equilibrium point to output the corresponding supporting force.
DSJun 17, 2018
Subspace Embedding and Linear Regression with Orlicz NormAlexandr Andoni, Chengyu Lin, Ying Sheng et al.
We consider a generalization of the classic linear regression problem to the case when the loss is an Orlicz norm. An Orlicz norm is parameterized by a non-negative convex function $G:\mathbb{R}_+\rightarrow\mathbb{R}_+$ with $G(0)=0$: the Orlicz norm of a vector $x\in\mathbb{R}^n$ is defined as $ \|x\|_G=\inf\left\{α>0\large\mid\sum_{i=1}^n G(|x_i|/α)\leq 1\right\}. $ We consider the cases where the function $G(\cdot)$ grows subquadratically. Our main result is based on a new oblivious embedding which embeds the column space of a given matrix $A\in\mathbb{R}^{n\times d}$ with Orlicz norm into a lower dimensional space with $\ell_2$ norm. Specifically, we show how to efficiently find an embedding matrix $S\in\mathbb{R}^{m\times n},m<n$ such that $\forall x\in\mathbb{R}^{d},Ω(1/(d\log n)) \cdot \|Ax\|_G\leq \|SAx\|_2\leq O(d^2\log n) \cdot \|Ax\|_G.$ By applying this subspace embedding technique, we show an approximation algorithm for the regression problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_G$, up to a $O(d\log^2 n)$ factor. As a further application of our techniques, we show how to also use them to improve on the algorithm for the $\ell_p$ low rank matrix approximation problem for $1\leq p<2$.