Kaixuan Jiang

CV
h-index54
4papers
33citations
Novelty50%
AI Score37

4 Papers

CVAug 12, 2022
dual unet:a novel siamese network for change detection with cascade differential fusion

Kaixuan Jiang, Ja Liu, Fang Liu et al.

Change detection (CD) of remote sensing images is to detect the change region by analyzing the difference between two bitemporal images. It is extensively used in land resource planning, natural hazards monitoring and other fields. In our study, we propose a novel Siamese neural network for change detection task, namely Dual-UNet. In contrast to previous individually encoded the bitemporal images, we design an encoder differential-attention module to focus on the spatial difference relationships of pixels. In order to improve the generalization of networks, it computes the attention weights between any pixels between bitemporal images and uses them to engender more discriminating features. In order to improve the feature fusion and avoid gradient vanishing, multi-scale weighted variance map fusion strategy is proposed in the decoding stage. Experiments demonstrate that the proposed approach consistently outperforms the most advanced methods on popular seasonal change detection datasets.

CVMar 14, 2025
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Kaixuan Jiang, Yang Liu, Weixing Chen et al.

Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions. However, current EQA approaches suffer from critical limitations in exploration efficiency, dataset design, and evaluation metrics. Moreover, existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning, while frontier-based exploration strategies struggle in cluttered environments and fail to ensure fine-grained exploration of task-relevant areas. To address these challenges, we construct the EXPloration-awaRe Embodied queStion anSwering Benchmark (EXPRESS-Bench), the largest dataset designed specifically to evaluate both exploration and reasoning capabilities. EXPRESS-Bench consists of 777 exploration trajectories and 2,044 question-trajectory pairs. To improve exploration efficiency, we propose Fine-EQA, a hybrid exploration model that integrates frontier-based and goal-oriented navigation to guide agents toward task-relevant regions more effectively. Additionally, we introduce a novel evaluation metric, Exploration-Answer Consistency (EAC), which ensures faithful assessment by measuring the alignment between answer grounding and exploration reliability. Extensive experimental comparisons with state-of-the-art EQA models demonstrate the effectiveness of our EXPRESS-Bench in advancing embodied exploration and question reasoning.

CVFeb 1, 2024
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

Yang Liu, Xinshuai Song, Kaixuan Jiang et al.

With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions. Specifically, we propose a novel Multimodal Environment Memory (MEM) module, facilitating the integration of embodied control with large models through the visual-language memory of scenes. This capability enables MEIA to generate executable action plans based on diverse requirements and the robot's capabilities. Furthermore, we construct an embodied question answering dataset based on a dynamic virtual cafe environment with the help of the large language model. In this virtual environment, we conduct several experiments, utilizing multiple large models through zero-shot learning, and carefully design scenarios for various situations. The experimental results showcase the promising performance of our MEIA in various embodied interactive tasks.

CVJan 25
Bridging Supervision Gaps: A Unified Framework for Remote Sensing Change Detection

Kaixuan Jiang, Chen Wu, Zhenghui Zhao et al.

Change detection (CD) aims to identify surface changes from multi-temporal remote sensing imagery. In real-world scenarios, Pixel-level change labels are expensive to acquire, and existing models struggle to adapt to scenarios with diverse annotation availability. To tackle this challenge, we propose a unified change detection framework (UniCD), which collaboratively handles supervised, weakly-supervised, and unsupervised tasks through a coupled architecture. UniCD eliminates architectural barriers through a shared encoder and multi-branch collaborative learning mechanism, achieving deep coupling of heterogeneous supervision signals. Specifically, UniCD consists of three supervision-specific branches. In the supervision branch, UniCD introduces the spatial-temporal awareness module (STAM), achieving efficient synergistic fusion of bi-temporal features. In the weakly-supervised branch, we construct change representation regularization (CRR), which steers model convergence from coarse-grained activations toward coherent and separable change modeling. In the unsupervised branch, we propose semantic prior-driven change inference (SPCI), which transforms unsupervised tasks into controlled weakly-supervised path optimization. Experiments on mainstream datasets demonstrate that UniCD achieves optimal performance across three tasks. It exhibits significant accuracy improvements in weakly and unsupervised scenarios, surpassing current state-of-the-art by 12.72% and 12.37% on LEVIR-CD, respectively.