Jiangshan He

h-index6
2papers

2 Papers

SDDec 30, 2025
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

Tianxin Xie, Wentao Lei, Guanjie Huang et al.

Text-to-audio-video (T2AV) generation underpins a wide range of applications demanding realistic audio-visual content, including virtual reality, world modeling, gaming, and filmmaking. However, existing T2AV models remain incapable of generating physically plausible sounds, primarily due to their limited understanding of physical principles. To situate current research progress, we present PhyAVBench, a challenging audio physics-sensitivity benchmark designed to systematically evaluate the audio physics grounding capabilities of existing T2AV models. PhyAVBench comprises 1,000 groups of paired text prompts with controlled physical variables that implicitly induce sound variations, enabling a fine-grained assessment of models' sensitivity to changes in underlying acoustic conditions. We term this evaluation paradigm the Audio-Physics Sensitivity Test (APST). Unlike prior benchmarks that primarily focus on audio-video synchronization, PhyAVBench explicitly evaluates models' understanding of the physical mechanisms underlying sound generation, covering 6 major audio physics dimensions, 4 daily scenarios (music, sound effects, speech, and their mix), and 50 fine-grained test points, ranging from fundamental aspects such as sound diffraction to more complex phenomena, e.g., Helmholtz resonance. Each test point consists of multiple groups of paired prompts, where each prompt is grounded by at least 20 newly recorded or collected real-world videos, thereby minimizing the risk of data leakage during model pre-training. Both prompts and videos are iteratively refined through rigorous human-involved error correction and quality control to ensure high quality. We argue that only models with a genuine grasp of audio-related physical principles can generate physically consistent audio-visual content. We hope PhyAVBench will stimulate future progress in this critical yet largely unexplored domain.

IVMay 29, 2025
Dc-EEMF: Pushing depth-of-field limit of photoacoustic microscopy via decision-level constrained learning

Wangting Zhou, Jiangshan He, Tong Cai et al.

Photoacoustic microscopy holds the potential to measure biomarkers' structural and functional status without labels, which significantly aids in comprehending pathophysiological conditions in biomedical research. However, conventional optical-resolution photoacoustic microscopy (OR-PAM) is hindered by a limited depth-of-field (DoF) due to the narrow depth range focused on a Gaussian beam. Consequently, it fails to resolve sufficient details in the depth direction. Herein, we propose a decision-level constrained end-to-end multi-focus image fusion (Dc-EEMF) to push DoF limit of PAM. The DC-EEMF method is a lightweight siamese network that incorporates an artifact-resistant channel-wise spatial frequency as its feature fusion rule. The meticulously crafted U-Net-based perceptual loss function for decision-level focus properties in end-to-end fusion seamlessly integrates the complementary advantages of spatial domain and transform domain methods within Dc-EEMF. This approach can be trained end-to-end without necessitating post-processing procedures. Experimental results and numerical analyses collectively demonstrate our method's robust performance, achieving an impressive fusion result for PAM images without a substantial sacrifice in lateral resolution. The utilization of Dc-EEMF-powered PAM has the potential to serve as a practical tool in preclinical and clinical studies requiring extended DoF for various applications.