Hongshuo Zhao

2papers

2 Papers

92.3LGMay 25Code
Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

Yulin Yuan, Hongshuo Zhao, Xiangming Meng

Diffusion-based multimodal large language models (dMLLMs) decode by iteratively predicting tokens at multiple masked positions in parallel. This turns each decoding step into a position-selection problem: the model must choose not only which predictions are reliable in isolation, but also which positions should be committed together as context for later decoding steps. Existing confidence-based decoding ranks masked positions independently and commits the top-K positions, largely ignoring whether the committed tokens provide complementary visual grounding. We identify a step-level limitation of this strategy in multimodal settings: high-confidence tokens selected in the same step can rely on overlapping visual grounding, introducing visual redundancy among the committed tokens and leaving less complementary visual grounding available for later decoding. To quantify this effect, we introduce the Visual Redundancy Index (VRI), which measures visual grounding overlap among tokens committed in parallel. To control this redundancy during decoding, we propose Visual-Redundancy-Controlled Decoding (VRCD), a training-free inference-time decoding method that uses token-to-image attention to prioritize visually complementary positions. Across diverse multimodal benchmarks, VRCD reduces visual redundancy and remaining-position entropy with modest runtime overhead. In longer decoding experiments, it also achieves relative accuracy gains of up to 18.8% on M^3CoT and 6.9% on MMBench over confidence-based decoding. Code will be released at https://github.com/infiniteYuanyl/VRCD.

30.9SYMay 23
Asymmetric Adaptation-based Real-time Fault Diagnosis Under Transitional Operating Conditions

Hongshuo Zhao, Zeyi Liu, Xiao He

Data streams in real-world industrial scenarios often contain transitional operating conditions that are uncovered during offline training, leading to significant distribution shifts. To bridge the gap between static offline models and dynamic online data, a novel asymmetric adaptation-based fault diagnosis method is proposed in this paper. Specifically, in the offline stage, we employ domain generalization techniques to extract domain-invariant features from multiple stable conditions and construct robust normalized fault prototypes as reference anchors. Subsequently, during online inference, we design an online test-time adaptation method based on a periodic prototype re-projection mechanism to dynamically update prototype positions. Furthermore, we utilize the geometric distribution derived from anchors to guide the updates of classifiers and adopt an asymmetric learning rate strategy for the feature extractor and classifier. The proposed approach ensures rapid adaptation to new transitional conditions while preserving the discriminative power inherited from the offline domain generalization initialization. Experimental results demonstrate that this mechanism effectively leverages offline generalized knowledge to guide online inference, significantly improving robustness in non-stationary environments.