Rong Yang

h-index63

4papers

51citations

Novelty53%

AI Score37

Ranked #116,220 of 201,018 authors (top 58%)#37,346 in CV (top 63%)

4 Papers

NAJun 4

Error Analysis of Tr-PINNs Algorithm for 2D Incompressible Navier-Stokes Equations with Non-Homogeneous Boundary Conditions

Dongjie Liu, Xuebo Li, Rong Yang

Physics-informed neural networks (PINNs) have been widely applied to solve Navier-Stokes equations by enforcing outputs and gradients of deep models to satisfy target equations. However, conventional PINNs only constrain the boundary terms by means of the $L^2$-norm when addressing the equations with non-homogeneous boundary conditions. This single constraint strategy may cause inaccurate boundary simulation, further resulting in the decline of prediction accuracy. To resolve this critical issue, this paper proposes an improved physics-informed neural network by correcting the error of the boundary value, which is called Tr-PINNs. Based on the results of nonhomogeneous Stokes problem, the algorithm error analysis of Tr-PINNs is established. The efficacy of the Tr-PINNs algorithm is demonstrated via numerical experiments, which further demonstrate that the Tr-PINNs algorithm achieves a remarkable improvement in computational accuracy.

CVFeb 29, 2024Code

Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

Nuo Xu, Wen Wang, Rong Yang et al.

Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph representation of the scenes, which results in misalignment with visual images. To provide more accurate and coherent scene descriptions and address this misalignment issue, we propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Technically, our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability. We extensively evaluate our method using the AI2-THOR simulator and conduct a series of experiments to demonstrate the effectiveness and efficiency of our navigator. Code available: https://github.com/nuoxu/AKGVP.

CVMar 14, 2025Code

Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)

Kelu Yao, Nuo Xu, Rong Yang et al.

This paper introduces a holistic vision-language foundation model tailored for remote sensing, named Falcon. Falcon offers a unified, prompt-based paradigm that effectively executes comprehensive and complex remote sensing tasks. Falcon demonstrates powerful understanding and reasoning abilities at the image, region, and pixel levels. Specifically, given simple natural language instructions and remote sensing images, Falcon can produce impressive results in text form across 14 distinct tasks, i.e., image classification, object detection, segmentation, image captioning, and etc. To facilitate Falcon's training and empower its representation capacity to encode rich spatial and semantic information, we developed Falcon_SFT, a large-scale, multi-task, instruction-tuning dataset in the field of remote sensing. The Falcon_SFT dataset consists of approximately 78 million high-quality data samples, covering 5.6 million multi-spatial resolution and multi-view remote sensing images with diverse instructions. It features hierarchical annotations and undergoes manual sampling verification to ensure high data quality and reliability. Extensive comparative experiments are conducted, which verify that Falcon achieves remarkable performance over 67 datasets and 14 tasks, despite having only 0.7B parameters. We release the complete dataset, code, and model weights at https://github.com/TianHuiLab/Falcon, hoping to help further develop the open-source community.

CLMay 30, 2025

CASPER: A Large Scale Spontaneous Speech Dataset

Cihan Xiao, Ruixing Liang, Xiangyu Zhang et al.

The success of large language models has driven interest in developing similar speech processing capabilities. However, a key challenge is the scarcity of high-quality spontaneous speech data, as most existing datasets contain scripted dialogues. To address this, we present a novel pipeline for eliciting and recording natural dialogues and release our dataset with 100+ hours of spontaneous speech. Our approach fosters fluid, natural conversations while encouraging a diverse range of topics and interactive exchanges. Unlike traditional methods, it facilitates genuine interactions, providing a reproducible framework for future data collection. This paper introduces our dataset and methodology, laying the groundwork for addressing the shortage of spontaneous speech data. We plan to expand this dataset in future stages, offering a growing resource for the research community.