Jianing Chen

CV
h-index15
14papers
139citations
Novelty43%
AI Score51

14 Papers

OCMay 21
Robust Nash equilibrium seeking based on semi-Markov switching topologies

Jianing Chen, Sitian Qin, Chuangyin Dang

This paper investigates a distributed robust Nash Equilibrium (NE) seeking problem for second-order players subject to external disturbances and uncertain dynamics while communicating via semi-Markov switching topologies. To accommodate the above concerns, the following targets require to be reached simultaneously: (1) Disturbances and uncertain dynamics rejection in finite time; (2) NE seeking for the second-order players; (3) Distributed action estimation on non-neighboring players under semi-Markov switching. By combining supertwisting-based Integral Sliding-Mode Control (ISMC) with a leader-follower consensus protocol, a novel robust NE seeking algorithm is constructed. Furthermore, to lessen dispensable information transmission, a sampled-data-based event-triggered mechanism is introduced. Incorporating the advantages of both semi-Markov switching and event-triggered mechanism, another NE seeking algorithm is proposed. Theoretical analysis via a Lyapunov-Krasovskii functional proves the leader-follower consensus can be achieved in the mean-square sense. Finally, a connectivity control game is formulated to validate the algorithms.

CVJan 8
TEA: Temporal Adaptive Satellite Image Semantic Segmentation

Juyuan Kang, Hao Zhu, Yan Zhu et al.

Crop mapping based on satellite images time-series (SITS) holds substantial economic value in agricultural production settings, in which parcel segmentation is an essential step. Existing approaches have achieved notable advancements in SITS segmentation with predetermined sequence lengths. However, we found that these approaches overlooked the generalization capability of models across scenarios with varying temporal length, leading to markedly poor segmentation results in such cases. To address this issue, we propose TEA, a TEmporal Adaptive SITS semantic segmentation method to enhance the model's resilience under varying sequence lengths. We introduce a teacher model that encapsulates the global sequence knowledge to guide a student model with adaptive temporal input lengths. Specifically, teacher shapes the student's feature space via intermediate embedding, prototypes and soft label perspectives to realize knowledge transfer, while dynamically aggregating student model to mitigate knowledge forgetting. Finally, we introduce full-sequence reconstruction as an auxiliary task to further enhance the quality of representations across inputs of varying temporal lengths. Through extensive experiments, we demonstrate that our method brings remarkable improvements across inputs of different temporal lengths on common benchmarks. Our code will be publicly available.

OCApr 10
Continuous-Time Distributed Seeking for Variational Generalized Nash Equilibrium of Online Game

Jianing Chen, Sichen Qian, Chuangyin Dang et al.

This paper mainly investigates a class of distributed Variational Generalized Nash Equilibrium (VGNE) seeking problems for both online noncooperative games and online aggregative games with time-varying coupling inequality constraints. Two novel continuous-time distributed VGNE seeking algorithms are proposed, which realize the constant regret bound and sublinear fit bound, superior to those of the criteria for online optimization problems and online games. Furthermore, to reduce unnecessary communication among players, a dynamic event-triggered mechanism involving internal variables is introduced into the distributed VGNE seeking algorithm, while the constant regret bound and sublinear fit bound are still maintained. Also, the Zeno behavior is strictly prohibited. Moreover, we further investigate the impact of communication noise on the player's measurement of its neighbors' relative states. It is demonstrated that both the regret and fit bounds remain valid as long as the noise level is not excessively large. This result reveals, to some extent, the proposed algorithm's noise-resilient capability. Finally, an online Uncrewed Aerial Vehicle (UAV) swarm game and an online Nash-Cournot game are given to demonstrate the validity of the theoretical results.

CVJan 21
Three-dimensional visualization of X-ray micro-CT with large-scale datasets: Efficiency and accuracy for real-time interaction

Yipeng Yin, Rao Yao, Qingying Li et al.

As Micro-CT technology continues to refine its characterization of material microstructures, industrial CT ultra-precision inspection is generating increasingly large datasets, necessitating solutions to the trade-off between accuracy and efficiency in the 3D characterization of defects during ultra-precise detection. This article provides a unique perspective on recent advances in accurate and efficient 3D visualization using Micro-CT, tracing its evolution from medical imaging to industrial non-destructive testing (NDT). Among the numerous CT reconstruction and volume rendering methods, this article selectively reviews and analyzes approaches that balance accuracy and efficiency, offering a comprehensive analysis to help researchers quickly grasp highly efficient and accurate 3D reconstruction methods for microscopic features. By comparing the principles of computed tomography with advancements in microstructural technology, this article examines the evolution of CT reconstruction algorithms from analytical methods to deep learning techniques, as well as improvements in volume rendering algorithms, acceleration, and data reduction. Additionally, it explores advanced lighting models for high-accuracy, photorealistic, and efficient volume rendering. Furthermore, this article envisions potential directions in CT reconstruction and volume rendering. It aims to guide future research in quickly selecting efficient and precise methods and developing new ideas and approaches for real-time online monitoring of internal material defects through virtual-physical interaction, for applying digital twin model to structural health monitoring (SHM).

CVJun 11, 2025
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

Jianing Chen, Zehao Li, Yujun Cai et al.

Reconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, extending it to dynamic scenes is challenging due to the difficulty of learning structured and temporally consistent motion representations. This challenge often manifests as three limitations in existing methods: redundant Gaussian updates, insufficient motion supervision, and weak modeling of complex non-rigid deformations. These issues collectively hinder coherent and efficient dynamic reconstruction. To address these limitations, we propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation. It first identifies motion-relevant regions via an Anchor Filter to suppress redundant updates in static areas. A self-supervised Induced Flow-Guided Deformation module induces anchor motion using multi-frame feature aggregation, eliminating the need for explicit flow labels. To further handle fine-grained deformations, a Hierarchical Anchor Propagation mechanism increases anchor resolution based on motion complexity and propagates multi-level transformations. Extensive experiments on synthetic and real-world benchmarks validate that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.

GRMay 28, 2025
STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering

Zehao Li, Hao Jiang, Yujun Cai et al.

Although dynamic scene reconstruction has long been a fundamental challenge in 3D vision, the recent emergence of 3D Gaussian Splatting (3DGS) offers a promising direction by enabling high-quality, real-time rendering through explicit Gaussian primitives. However, existing 3DGS-based methods for dynamic reconstruction often suffer from \textit{spatio-temporal incoherence} during initialization, where canonical Gaussians are constructed by aggregating observations from multiple frames without temporal distinction. This results in spatio-temporally entangled representations, making it difficult to model dynamic motion accurately. To overcome this limitation, we propose \textbf{STDR} (Spatio-Temporal Decoupling for Real-time rendering), a plug-and-play module that learns spatio-temporal probability distributions for each Gaussian. STDR introduces a spatio-temporal mask, a separated deformation field, and a consistency regularization to jointly disentangle spatial and temporal patterns. Extensive experiments demonstrate that incorporating our module into existing 3DGS-based dynamic scene reconstruction frameworks leads to notable improvements in both reconstruction quality and spatio-temporal consistency across synthetic and real-world benchmarks.

CVJun 16, 2025
DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo

Zhenlong Yuan, Dapeng Zhang, Zehao Li et al.

Recently, patch deformation-based methods have demonstrated significant effectiveness in multi-view stereo due to their incorporation of deformable and expandable perception for reconstructing textureless areas. However, these methods generally focus on identifying reliable pixel correlations to mitigate matching ambiguity of patch deformation, while neglecting the deformation instability caused by edge-skipping and visibility occlusions, which may cause potential estimation deviations. To address these issues, we propose DVP-MVS++, an innovative approach that synergizes both depth-normal-edge aligned and harmonized cross-view priors for robust and visibility-aware patch deformation. Specifically, to avoid edge-skipping, we first apply DepthPro, Metric3Dv2 and Roberts operator to generate coarse depth maps, normal maps and edge maps, respectively. These maps are then aligned via an erosion-dilation strategy to produce fine-grained homogeneous boundaries for facilitating robust patch deformation. Moreover, we reformulate view selection weights as visibility maps, and then implement both an enhanced cross-view depth reprojection and an area-maximization strategy to help reliably restore visible areas and effectively balance deformed patch, thus acquiring harmonized cross-view priors for visibility-aware patch deformation. Additionally, we obtain geometry consistency by adopting both aggregated normals via view selection and projection depth differences via epipolar lines, and then employ SHIQ for highlight correction to enable geometry consistency with highlight-aware perception, thus improving reconstruction quality during propagation and refinement stage. Evaluation results on ETH3D, Tanks & Temples and Strecha datasets exhibit the state-of-the-art performance and robust generalization capability of our proposed method.

CVOct 3, 2025
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting

Jianing Chen, Zehao Li, Yujun Cai et al.

Dynamic 3D reconstruction from monocular videos remains difficult due to the ambiguity inferring 3D motion from limited views and computational demands of modeling temporally varying scenes. While recent sparse control methods alleviate computation by reducing millions of Gaussians to thousands of control points, they suffer from a critical limitation: they allocate points purely by geometry, leading to static redundancy and dynamic insufficiency. We propose a motion-adaptive framework that aligns control density with motion complexity. Leveraging semantic and motion priors from vision foundation models, we establish patch-token-node correspondences and apply motion-adaptive compression to concentrate control points in dynamic regions while suppressing redundancy in static backgrounds. Our approach achieves flexible representational density adaptation through iterative voxelization and motion tendency scoring, directly addressing the fundamental mismatch between control point allocation and motion complexity. To capture temporal evolution, we introduce spline-based trajectory parameterization initialized by 2D tracklets, replacing MLP-based deformation fields to achieve smoother motion representation and more stable optimization. Extensive experiments demonstrate significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.

LGMay 8, 2023
Mlinear: Rethink the Linear Model for Time-series Forecasting

Wei Li, Xiangxu Meng, Chuhao Chen et al.

Recently, significant advancements have been made in time-series forecasting research, with an increasing focus on analyzing the nature of time-series data, e.g, channel-independence (CI) and channel-dependence (CD), rather than solely focusing on designing sophisticated forecasting models. However, current research has primarily focused on either CI or CD in isolation, and the challenge of effectively combining these two opposing properties to achieve a synergistic effect remains an unresolved issue. In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e.g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?" To answer this question, we propose Mlinear (MIX-Linear), a simple yet effective method based mainly on linear layers. The design philosophy of Mlinear mainly includes two aspects:(1) dynamically tuning the CI and CD properties based on the time semantics of different input time series, and (2) providing deep supervision to adjust the individual performance of the "CI predictor" and "CD predictor". In addition, empirically, we introduce a new loss function that significantly outperforms the widely used mean squared error (MSE) on multiple datasets. Experiments on time-series datasets covering multiple fields and widely used have demonstrated the superiority of our method over PatchTST which is the lateset Transformer-based method in terms of the MSE and MAE metrics on 7 datasets with identical sequence inputs (336 or 512). Specifically, our method significantly outperforms PatchTST with a ratio of 21:3 at 336 sequence length input and 29:10 at 512 sequence length input. Additionally, our approach has a 10 $\times$ efficiency advantage at the unit level, taking into account both training and inference times.

CVMay 26, 2021
Direct Servo Control from In-Sensor CNN Inference with A Pixel Processor Array

Yanan Liu, Jianing Chen, Laurie Bose et al.

This work demonstrates direct visual sensory-motor control using high-speed CNN inference via a SCAMP-5 Pixel Processor Array (PPA). We demonstrate how PPAs are able to efficiently bridge the gap between perception and action. A binary Convolutional Neural Network (CNN) is used for a classic rock, paper, scissors classification problem at over 8000 FPS. Control instructions are directly sent to a servo motor from the PPA according to the CNN's classification result without any other intermediate hardware.

ROMay 21, 2021
Bringing A Robot Simulator to the SCAMP Vision System

Yanan Liu, Jianing Chen, Laurie Bose et al.

This work develops and demonstrates the integration of the SCAMP-5d vision system into the CoppeliaSim robot simulator, creating a semi-simulated environment. By configuring a camera in the simulator and setting up communication with the SCAMP python host through remote API, sensor images from the simulator can be transferred to the SCAMP vision sensor, where on-sensor image processing such as CNN inference can be performed. SCAMP output is then fed back into CoppeliaSim. This proposed platform integration enables rapid prototyping validations of SCAMP algorithms for robotic systems. We demonstrate a car localisation and tracking task using this proposed semi-simulated platform, with a CNN inference on SCAMP to command the motion of a robot. We made this platform available online.

ROSep 27, 2020
Agile Reactive Navigation for A Non-Holonomic Mobile Robot Using A Pixel Processor Array

Yanan Liu, Laurie Bose, Colin Greatwood et al.

This paper presents an agile reactive navigation strategy for driving a non-holonomic ground vehicle around a preset course of gates in a cluttered environment using a low-cost processor array sensor. This enables machine vision tasks to be performed directly upon the sensor's image plane, rather than using a separate general-purpose computer. We demonstrate a small ground vehicle running through or avoiding multiple gates at high speed using minimal computational resources. To achieve this, target tracking algorithms are developed for the Pixel Processing Array and captured images are then processed directly on the vision sensor acquiring target information for controlling the ground vehicle. The algorithm can run at up to 2000 fps outdoors and 200fps at indoor illumination levels. Conducting image processing at the sensor level avoids the bottleneck of image transfer encountered in conventional sensors. The real-time performance of on-board image processing and robustness is validated through experiments. Experimental results demonstrate that the algorithm's ability to enable a ground vehicle to navigate at an average speed of 2.20 m/s for passing through multiple gates and 3.88 m/s for a 'slalom' task in an environment featuring significant visual clutter.

CVApr 27, 2020
Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays

Laurie Bose, Jianing Chen, Stephen J. Carey et al.

We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities. PPA sensors consist of an array of processing elements (PEs), with each PE capable of light capture, data storage and computation, allowing various computer vision processing to be executed directly upon the sensor device. The key idea behind our approach is storing network weights "in-pixel" within the PEs of the PPA sensor itself to allow various computations, such as multiple different image convolutions, to be carried out in parallel. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor, while leaving no untapped computational resources. This is in contrast to previous works that only use a sensor-level processing to sequentially compute image convolutions, and must transfer data to an external digital processor to complete the computation. We demonstrate our approach on the SCAMP-5 vision system, performing inference of a MNIST digit classification network at over 3000 frames per second and over 93% classification accuracy. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.

CVSep 12, 2019
A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays

Laurie Bose, Jianing Chen, Stephen J. Carey et al.

We present a convolutional neural network implementation for pixel processor array (PPA) sensors. PPA hardware consists of a fine-grained array of general-purpose processing elements, each capable of light capture, data storage, program execution, and communication with neighboring elements. This allows images to be stored and manipulated directly at the point of light capture, rather than having to transfer images to external processing hardware. Our CNN approach divides this array up into 4x4 blocks of processing elements, essentially trading-off image resolution for increased local memory capacity per 4x4 "pixel". We implement parallel operations for image addition, subtraction and bit-shifting images in this 4x4 block format. Using these components we formulate how to perform ternary weight convolutions upon these images, compactly store results of such convolutions, perform max-pooling, and transfer the resulting sub-sampled data to an attached micro-controller. We train ternary weight filter CNNs for digit recognition and a simple tracking task, and demonstrate inference of these networks upon the SCAMP5 PPA system. This work represents a first step towards embedding neural network processing capability directly onto the focal plane of a sensor.