Zhe Tong

SD
h-index1
3papers
23citations
Novelty45%
AI Score49

3 Papers

CVMay 14Code
H-OmniStereo: Zero-Shot Omnidirectional Stereo Matching with Heading-Aligned Normal Priors

Chenxing Jiang, Zhe Tong, Pusen Gao et al.

Stereo matching on top-bottom equirectangular images provides an effective framework for full-surround perception, as vertically aligned epipolar lines enable the use of advanced perspective stereo architectures that are largely driven by large-scale datasets and monocular priors. However, the performance of such adaptations is severely limited by the scarcity of omnidirectional stereo datasets and the degradation of perspective monocular priors under spherical distortions.To address these challenges, we propose H-OmniStereo, a zero-shot omnidirectional stereo matching framework. First, we construct high-quality synthetic dataset comprising over 2.8 million top-bottom equirectangular stereo pairs to scale up training. Second, we introduce an equirectangular monocular normal estimator, specifically operating in a heading-aligned coordinate system. Beyond providing distortion-robust and cross-view-consistent geometric priors for establishing reliable correspondences in stereo matching, this design boosts training efficiency and accommodates train-test FoV mismatches.Extensive experiments show that our approach achieves higher accuracy than existing methods on out-of-domain datasets and successfully generalizes to real-world consumer camera setups using a single model. Both the model and the dataset will be open-sourced.

SDAug 12, 2025
Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization

Chaoqun Cui, Liangbin Huang, Shijing Wang et al.

Video dubbing aims to translate original speech in visual media programs from the source language to the target language, relying on neural machine translation and text-to-speech technologies. Due to varying information densities across languages, target speech often mismatches the source speech duration, causing audio-video synchronization issues that significantly impact viewer experience. In this study, we approach duration alignment in LLM-based video dubbing machine translation as a preference optimization problem. We propose the Segment Supervised Preference Optimization (SSPO) method, which employs a segment-wise sampling strategy and fine-grained loss to mitigate duration mismatches between source and target lines. Experimental results demonstrate that SSPO achieves superior performance in duration alignment tasks.

SDJul 31, 2017
Bearing fault diagnosis under varying working condition based on domain adaptation

Bo Zhang, Wei Li, Zhe Tong et al.

Traditional intelligent fault diagnosis of rolling bearings work well only under a common assumption that the labeled training data (source domain) and unlabeled testing data (target domain) are drawn from the same distribution. When the distribution changes, most fault diagnosis models need to be rebuilt from scratch using newly recollected labeled training data. However, it is expensive or impossible to annotate huge amount of training data to rebuild such new model. Meanwhile, large amounts of labeled training data have not been fully utilized yet, which is apparently a waste of resources. As one of the important research directions of transfer learning, domain adaptation (DA) typically aims at minimizing the differences between distributions of different domains in order to minimize the cross-domain prediction error by taking full advantage of information coming from both source and target domains. In this paper, we present one of the first studies on unsupervised DA in the field of fault diagnosis of rolling bearings under varying working conditions and a novel diagnosis strategy based on unsupervised DA using subspace alignment (SA) is proposed. After processed by unsupervised DA with SA, the distributions of training data and testing data become close and the classifier trained on training data can be used to classify the testing data. Experimental results on the 60 domain adaptation diagnosis problems under varying working condition in Case Western Reserve benchmark data and 12 domain adaptation diagnosis problems under varying working conditions in our new data are given to demonstrate the effectiveness of the proposed method. The proposed methods can effectively distinguish not only bearing faults categories but also fault severities.