Haisu Wu

2papers

2 Papers

31.9CVMay 28
GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver

Yuqing Chen, Lin Liu, Haisu Wu et al.

Video object removal frequently struggles to simultaneously eliminate target objects and their associated physical effects (e.g., smoke, reflections, light, and ripples) in out-of-domain scenarios due to complex spatiotemporal ambiguities. While existing methods primarily rely on spatial masks, they often fail to capture weakly correlated effects, and the potential of explicit textual guidance remains underexplored. Furthermore, a fundamental optimization conflict exists in removal models between high-level semantic generalization and precise pixel-level background preservation. To address these challenges, we propose GenEraser, a novel framework for generalized and high-fidelity video object and effect removal. First, we introduce a Multi-Conditional Mixture-of-Experts (MC-MoE) paired with Bipartite Text guidance to fully exploit the multimodal priors of Diffusion Transformers, significantly enhancing the identification of complex effects. Second, a Learnable Deep ``CFG'' Fusion mechanism (LD-CFG) is developed to adaptively balance the relative dominance of mask and textual conditions across diverse scenarios. Finally, we propose a Decoupled Expert Architecture, comprising a Locator and a Preserver, to mitigate the inherent trade-off between semantic generalization and pixel alignment. Extensive experiments demonstrate that our GenEraser surpasses recent state-of-the-art approaches, achieving significant quantitative improvements (e.g., $2.16$ dB and $1.44$ dB on the ROSE Benchmark and VOR-Eval, respectively) while maintaining exceptionally robust generalization in open-world scenarios. https://cyqii.github.io/GenEraser.github.io/

18.6ITMar 10
Tensor Train Decomposition-based Channel Estimation for MIMO-AFDM Systems with Fractional Delay and Doppler

Ruizhe Wang, Cunhua Pan, Hong Ren et al.

Affine Frequency Division Multiplexing (AFDM) has emerged as a promising chirp-based multicarrier technology for high-speed communication systems. To fully exploit the diversity gain offered by AFDM, accurate channel estimation is essential. However, existing studies have mainly focused on the integer-delay-tap scenario and single-symbol pilot-based estimation. Since delay taps in practice are generally fractional, approximating them as integers not only degrades delay estimation accuracy but also severely affects Doppler frequency estimation. To address this problem, in this paper, we investigate channel estimation for multiple-input multiple-output (MIMO)-AFDM systems. A time-affine frequency (T-AF) domain pilot structure is proposed to exploit time-domain phase variations. By leveraging the rotational invariance property in the spatial and temporal domains, a channel estimation algorithm based on Vandermonde-structured tensor-train (TT) decomposition is developed. The proposed algorithm demonstrates superior computational efficiency compared with state-of-the-art parameter estimation methods. Moreover, diverging from current studies, we derive the global Ziv-Zakai bound (ZZB) as an alternative parameter estimation error lower bound to the Cramér-Rao bound (CRB). Numerical results show that the derived ZZB provides tighter global performance characterization and successfully captures the threshold phenomenon in mean square error (MSE) performance in the low-SNR regime. Furthermore, the proposed algorithm achieves superior communication performance relative to the existing schemes, while offering a computational speedup, reducing the execution time by an order of magnitude compared to the state-of-the-art iterative algorithms.