SYFeb 3, 2017
Adaptive Adjustment of Noise Covariance in Kalman Filter for Dynamic State EstimationShahrokh Akhlaghi, Ning Zhou, Zhenyu Huang
Accurate estimation of the dynamic states of a synchronous machine (e.g., rotor s angle and speed) is essential in monitoring and controlling transient stability of a power system. It is well known that the covariance matrixes of process noise (Q) and measurement noise (R) have a significant impact on the Kalman filter s performance in estimating dynamic states. The conventional ad-hoc approaches for estimating the covariance matrixes are not adequate in achieving the best filtering performance. To address this problem, this paper proposes an adaptive filtering approach to adaptively estimate Q and R based on innovation and residual to improve the dynamic state estimation accuracy of the extended Kalman filter (EKF). It is shown through the simulation on the two-area model that the proposed estimation method is more robust against the initial errors in Q and R than the conventional method in estimating the dynamic states of a synchronous machine.
SYFeb 1, 2017
Adaptive Multi-Step Prediction based EKF to Power System Dynamic State EstimationShahrokh Akhlaghi, Ning Zhou
Power system dynamic state estimation is essential to monitoring and controlling power system stability. Kalman filtering approaches are predominant in estimation of synchronous machine dynamic states (i.e. rotor angle and rotor speed). This paper proposes an adaptive multi-step prediction (AMSP) approach to improve the extended Kalman filter s (EKF) performance in estimating the dynamic states of a synchronous machine. The proposed approach consists of three major steps. First, two indexes are defined to quantify the non-linearity levels of the state transition function and measurement function, respectively. Second, based on the non-linearity indexes, a multi prediction factor (Mp) is defined to determine the number of prediction steps. And finally, to mitigate the non-linearity impact on dynamic state estimation (DSE) accuracy, the prediction step repeats a few time based on Mp before performing the correction step. The two-area four-machine system is used to evaluate the effectiveness of the proposed AMSP approach. It is shown through the Monte-Carlo method that a good trade-off between estimation accuracy and computational time can be achieved effectively through the proposed AMSP approach.
76.9CVMay 20
$Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From VideosChia-Hsiang Kao, Cong Phuoc Huynh, Chien-Yi Wang et al.
Inferring rigid-body physical states and properties from monocular videos is a fundamental step toward physics-based perception and simulation. Existing approaches assume specific underlying physical systems, object types, and camera poses, making them unable to generalize to complex real-world settings. We introduce $Δ$YNAMICS, a vision-language framework that uses language as a unified representation of rigid-body dynamics. Instead of directly predicting parameters, $Δ$YNAMICS generates scene configurations in a structured text format for physics simulation. We enhance the model's generalization by integrating natural language motion reasoning and leveraging optical flow as a semantic-agnostic input. On the CLEVRER dataset, $Δ$YNAMICS achieves a segmentation IoU of 0.30, a 7x improvement over leading VLMs (InternVL3-8B, Qwen2.5-VL-7B and Claude-4-Sonnet). Additionally, test-time sampling and evolutionary search further boost performance by 27% and 120% in segmentation IoU, respectively. Finally, we demonstrate strong transfer to a new dataset of 235 real-world rigid-body videos, highlighting the potential of language-driven physics inference for bridging perception and simulation.
97.8CVMar 26
Reinforcing Structured Chain-of-Thought for Video UnderstandingPeiyao Wang, Haotian Xu, Noranart Vesdapunt et al.
Multi-modal Large Language Models (MLLMs) show promise in video understanding. However, their reasoning often suffers from thinking drift and weak temporal comprehension, even when enhanced by Reinforcement Learning (RL) techniques like Group Relative Policy Optimization (GRPO). Moreover, existing RL methods usually depend on Supervised Fine-Tuning (SFT), which requires costly Chain-of-Thought (CoT) annotation and multi-stage training, and enforces fixed reasoning paths, limiting MLLMs' ability to generalize and potentially inducing bias. To overcome these limitations, we introduce Summary-Driven Reinforcement Learning (SDRL), a novel single-stage RL framework that obviates the need for SFT by utilizing a Structured CoT format: Summarize -> Think -> Answer. SDRL introduces two self-supervised mechanisms integrated into the GRPO objective: 1) Consistency of Vision Knowledge (CVK) enforces factual grounding by reducing KL divergence among generated summaries; and 2) Dynamic Variety of Reasoning (DVR) promotes exploration by dynamically modulating thinking diversity based on group accuracy. This novel integration effectively balances alignment and exploration, supervising both the final answer and the reasoning process. Our method achieves state-of-the-art performance on seven public VideoQA datasets.
7.0ROMay 15
Beyond Collision Avoidance: Multi-Robot Yielding and Spatial Affordance in Emergency EvacuationsNing Zhou, Edmund R. Hunt, Nikolai W. F. Bode
As mobile service robots increasingly coexist with pedestrians, ensuring passively safe behaviour during confined emergency evacuations is critical. Existing multi-robot yielding strategies often focus solely on collision avoidance and macroscopic flow optimisation, overlooking environmental affordances and human spatial expectations. To bridge the gap between macroscopic theory and micro-level perception, we conducted a game-based virtual evacuation experiment (N=56). We investigated individual psychological responses to four multi-robot yielding strategies (Hide, LineEscape, Freeze, ShortestPath) across confined corridors with and without refuge niches. Our results establish a robust preference hierarchy (Hide > LineEscape > Freeze > ShortestPath), demonstrating that proactive space-yielding significantly outperforms freezing and efficiency-first approaches. Crucially, we found that environmental affordances heavily shape cognitive expectations. Actively utilising available niches amplifies the psychological comfort of proactive yielding (Hide). Conversely, failing to use an obvious niche (e.g., executing LineEscape) may trigger Expectation Violation. This is reflected in a drastically increased perceived cognitive delay, despite objectively unimpeded trajectories. Furthermore, prior robot interaction experience helps users decode complex social intents. Ultimately, this research demonstrates that safe human-robot interaction during emergencies must evolve from pure trajectory optimisation to semantically aware navigation. Future work will extend this framework to investigate complex interactions between robot swarms and pedestrian crowds.
15.6MAMay 15
Multi-Agent Cooperative Transportation: Optimal and Efficient Task Allocation and Path FindingNing Zhou, Nikolai W. F. Bode, Edmund R. Hunt
Multi-robot systems are integral to modern logistics, but their capabilities are often limited to tasks executable by individual agents. This paper addresses a critical gap in existing frameworks like Multi-Agent Path Finding (MAPF) and Task Allocation and Path Finding (TAPF), which lack true cooperation for transporting large items that require multiple agents. To this end, we formalise the Cooperative Transportation Task Allocation and Path Finding (CT-TAPF) problem, which integrates team formation, task assignment, and collision-free pathfinding. We present an optimal solver, Cooperative Transportation Task Conflict-Based Search (CT-TCBS), which features a novel Incremental Expansion strategy to tackle the combinatorial explosion inherent in team formation. Recognising the computational cost of optimality, we also develop a family of sub-optimal solvers that employ a global, task-centric perspective, selecting the next task to assign based on a global difficulty metric (Best Task or Worst Task). Our comprehensive empirical evaluation demonstrates three key findings: (1) the incremental expansion strategy significantly outperforms the naive combinatorial approach by successfully pruning the dominant task-allocation search space; (2) we identify a task-conflict expansion dilemma, where sophisticated conflict resolvers effective for large-agent pathfinding subproblems can be detrimental in the integrated CT-TAPF setting; and (3) our proposed sub-optimal solvers establish a new, more efficient frontier on the solution quality-runtime spectrum compared to "nn-" agent-centric baselines. This work provides a foundational framework and a set of effective algorithms for a new, practical class of cooperative multi-agent problems.
IVApr 15, 2024
Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor DetectionLisang Zhou, Meng Wang, Ning Zhou
Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis while protecting patient privacy, which is crucial for achieving efficient medical image analysis and accelerating medical research progress. This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to address the dual challenges of data privacy and efficient disease diagnosis. Traditional Centralized Machine Learning models, despite their widespread use in medical imaging for tasks such as disease diagnosis, raise significant privacy concerns due to the sensitive nature of patient data. As an alternative, FL emerges as a promising solution by allowing the training of a collective global model across local clients without centralizing the data, thus preserving privacy. Focusing on the application of FL in Magnetic Resonance Imaging (MRI) brain tumor detection, this study demonstrates the effectiveness of the Federated Learning framework coupled with EfficientNet-B0 and the FedAvg algorithm in enhancing both privacy and diagnostic accuracy. Through a meticulous selection of preprocessing methods, algorithms, and hyperparameters, and a comparative analysis of various Convolutional Neural Network (CNN) architectures, the research uncovers optimal strategies for image classification. The experimental results reveal that EfficientNet-B0 outperforms other models like ResNet in handling data heterogeneity and achieving higher accuracy and lower loss, highlighting the potential of FL in overcoming the limitations of traditional models. The study underscores the significance of addressing data heterogeneity and proposes further research directions for broadening the applicability of FL in medical image analysis.
AIMar 10, 2025
Encoding Argumentation Frameworks to Propositional Logic SystemsShuai Tang, Jiachao Wu, Ning Zhou
This paper generalizes the encoding of argumentation frameworks beyond the classical 2-valued propositional logic system ($PL_2$) to 3-valued propositional logic systems ($PL_3$s) and fuzzy propositional logic systems ($PL_{[0,1]}s$), employing two key encodings: normal encoding ($ec_1$) and regular encoding ($ec_2$). Specifically, via $ec_1$ and $ec_2$, we establish model relationships between Dung's classical semantics (stable and complete semantics) and the encoded semantics associated with Kleene's $PL_3$ and Łukasiewicz's $PL_3$. Through $ec_1$, we also explore connections between Gabbay's real equational semantics and the encoded semantics of $PL_{[0,1]}s$, including showing that Gabbay's $Eq_{\text{max}}^R$ and $Eq_{\text{inverse}}^R$ correspond to the fuzzy encoded semantics of $PL_{[0,1]}^G$ and $PL_{[0,1]}^P$ respectively. Additionally, we propose a new fuzzy encoded semantics ($Eq^L$) associated with Łukasiewicz's $PL_{[0,1]}$ and investigate interactions between complete semantics and fuzzy encoded semantics. This work strengthens the links between argumentation frameworks and propositional logic systems, providing a framework for constructing new argumentation semantics.
CVJun 17, 2025
DepthSeg: Depth prompting in remote sensing semantic segmentationNing Zhou, Shanxiong Chen, Mingting Zhou et al.
Remote sensing semantic segmentation is crucial for extracting detailed land surface information, enabling applications such as environmental monitoring, land use planning, and resource assessment. In recent years, advancements in artificial intelligence have spurred the development of automatic remote sensing semantic segmentation methods. However, the existing semantic segmentation methods focus on distinguishing spectral characteristics of different objects while ignoring the differences in the elevation of the different targets. This results in land cover misclassification in complex scenarios involving shadow occlusion and spectral confusion. In this paper, we introduce a depth prompting two-dimensional (2D) remote sensing semantic segmentation framework (DepthSeg). It automatically models depth/height information from 2D remote sensing images and integrates it into the semantic segmentation framework to mitigate the effects of spectral confusion and shadow occlusion. During the feature extraction phase of DepthSeg, we introduce a lightweight adapter to enable cost-effective fine-tuning of the large-parameter vision transformer encoder pre-trained by natural images. In the depth prompting phase, we propose a depth prompter to model depth/height features explicitly. In the semantic prediction phase, we introduce a semantic classification decoder that couples the depth prompts with high-dimensional land-cover features, enabling accurate extraction of land-cover types. Experiments on the LiuZhou dataset validate the advantages of the DepthSeg framework in land cover mapping tasks. Detailed ablation studies further highlight the significance of the depth prompts in remote sensing semantic segmentation.
CVMar 17, 2025
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal DatasetBin Tang, Keqi Pan, Miao Zheng et al.
In recent years, predicting Big Five personality traits from multimodal data has received significant attention in artificial intelligence (AI). However, existing computational models often fail to achieve satisfactory performance. Psychological research has shown a strong correlation between pose and personality traits, yet previous research has largely ignored pose data in computational models. To address this gap, we develop a novel multimodal dataset that incorporates full-body pose data. The dataset includes video recordings of 287 participants completing a virtual interview with 36 questions, along with self-reported Big Five personality scores as labels. To effectively utilize this multimodal data, we introduce the Psychology-Inspired Network (PINet), which consists of three key modules: Multimodal Feature Awareness (MFA), Multimodal Feature Interaction (MFI), and Psychology-Informed Modality Correlation Loss (PIMC Loss). The MFA module leverages the Vision Mamba Block to capture comprehensive visual features related to personality, while the MFI module efficiently fuses the multimodal features. The PIMC Loss, grounded in psychological theory, guides the model to emphasize different modalities for different personality dimensions. Experimental results show that the PINet outperforms several state-of-the-art baseline models. Furthermore, the three modules of PINet contribute almost equally to the model's overall performance. Incorporating pose data significantly enhances the model's performance, with the pose modality ranking mid-level in importance among the five modalities. These findings address the existing gap in personality-related datasets that lack full-body pose data and provide a new approach for improving the accuracy of personality prediction models, highlighting the importance of integrating psychological insights into AI frameworks.
CVApr 6, 2020
Adaptive Fractional Dilated Convolution Network for Image Aesthetics AssessmentQiuyu Chen, Wei Zhang, Ning Zhou et al.
To leverage deep learning for image aesthetics assessment, one critical but unsolved issue is how to seamlessly incorporate the information of image aspect ratios to learn more robust models. In this paper, an adaptive fractional dilated convolution (AFDC), which is aspect-ratio-embedded, composition-preserving and parameter-free, is developed to tackle this issue natively in convolutional kernel level. Specifically, the fractional dilated kernel is adaptively constructed according to the image aspect ratios, where the interpolation of nearest two integers dilated kernels is used to cope with the misalignment of fractional sampling. Moreover, we provide a concise formulation for mini-batch training and utilize a grouping strategy to reduce computational overhead. As a result, it can be easily implemented by common deep learning libraries and plugged into popular CNN architectures in a computation-efficient manner. Our experimental results demonstrate that our proposed method achieves state-of-the-art performance on image aesthetics assessment over the AVA dataset.
SEAug 28, 2019
A Semantic Schema for Data Quality Management in a Multi-Tenant Data PlatformNing Zhou, Sandra Garcia Esparza, Lars Marius Garshol
Schibsted Media Group is a global marketplace company with presence in more than 20 countries. It is undergoing a digital transformation to convert data silos to a multi-tenant system based on a common data platform. Good data quality based on a common schema on the semantic level is essential for building successful data-driven products across marketplaces. To solve this challenge, we developed the data quality tooling based on a semantic schema management system to support schema evolution with versioning, testing and transformation. It can monitor the data quality requirements for different applications and handle incoming data consisting of multiple schema versions. Today the system is operating in production and processes over one billion events per day for over 100 applications.
IRSep 6, 2018
Five lessons from building a deep neural network recommenderSimen Eide, Audun M. Øygard, Ning Zhou
Recommendation algorithms are widely adopted in marketplaces to help users find the items they are looking for. The sparsity of the items by user matrix and the cold-start issue in marketplaces pose challenges for the off-the-shelf matrix factorization based recommender systems. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper summarizes five lessons we learned from experimenting with state-of-the-art deep learning recommenders at the leading Norwegian marketplace FINN.no. We design a hybrid recommender system that takes the user-generated contents of a marketplace (including text, images and meta attributes) and combines them with user behavior data such as page views and messages to provide recommendations for marketplace items. Among various tactics we experimented with, the following five show the best impact: staged training instead of end-to-end training, leveraging rich user behaviors beyond page views, using user behaviors as noisy labels to train embeddings, using transfer learning to solve the unbalanced data problem, and using attention mechanisms in the hybrid model. This system is currently running with around 20% click-through-rate in production at FINN.no and serves over one million visitors everyday.
IRSep 6, 2018
Deep neural network marketplace recommenders in online experimentsSimen Eide, Ning Zhou
Recommendations are broadly used in marketplaces to match users with items relevant to their interests and needs. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper focuses on the challenge of measuring recommender performance and summarizes the online experiment results with several promising types of deep neural network recommenders - hybrid item representation models combining features from user engagement and content, sequence-based models, and multi-armed bandit models that optimize user engagement by re-ranking proposals from multiple submodels. The recommenders are currently running in production at the leading Norwegian marketplace FINN.no and serves over one million visitors everyday.
MLSep 24, 2017
Weather Forecasting Error in Solar Energy ForecastingHossein Sangrody, Morteza Sarailoo, Ning Zhou et al.
As renewable distributed energy resources (DERs) penetrate the power grid at an accelerating speed, it is essential for operators to have accurate solar photovoltaic (PV) energy forecasting for efficient operations and planning. Generally, observed weather data are applied in the solar PV generation forecasting model while in practice the energy forecasting is based on forecasted weather data. In this paper, a study on the uncertainty in weather forecasting for the most commonly used weather variables is presented. The forecasted weather data for six days ahead is compared with the observed data and the results of analysis are quantified by statistical metrics. In addition, the most influential weather predictors in energy forecasting model are selected. The performance of historical and observed weather data errors is assessed using a solar PV generation forecasting model. Finally, a sensitivity test is performed to identify the influential weather variables whose accurate values can significantly improve the results of energy forecasting.
MLJul 15, 2017
On the Performance of Forecasting Models in the Presence of Input UncertaintyHossein Sangrody, Morteza Sarailoo, Ning Zhou et al.
Nowadays, with the unprecedented penetration of renewable distributed energy resources (DERs), the necessity of an efficient energy forecasting model is more demanding than before. Generally, forecasting models are trained using observed weather data while the trained models are applied for energy forecasting using forecasted weather data. In this study, the performance of several commonly used forecasting methods in the presence of weather predictors with uncertainty is assessed and compared. Accordingly, both observed and forecasted weather data are collected, then the influential predictors for solar PV generation forecasting model are selected using several measures. Using observed and forecasted weather data, an analysis on the uncertainty of weather variables is represented by MAE and bootstrapping. The energy forecasting model is trained using observed weather data, and finally, the performance of several commonly used forecasting methods in solar energy forecasting is simulated and compared for a real case study.
CVJul 8, 2017
Embedding Visual Hierarchy with Deep Networks for Large-Scale Visual RecognitionTianyi Zhao, Baopeng Zhang, Wei Zhang et al.
In this paper, a level-wise mixture model (LMM) is developed by embedding visual hierarchy with deep networks to support large-scale visual recognition (i.e., recognizing thousands or even tens of thousands of object classes), and a Bayesian approach is used to adapt a pre-trained visual hierarchy automatically to the improvements of deep features (that are used for image and object class representation) when more representative deep networks are learned along the time. Our LMM model can provide an end-to-end approach for jointly learning: (a) the deep networks to extract more discriminative deep features for image and object class representation; (b) the tree classifier for recognizing large numbers of object classes hierarchically; and (c) the visual hierarchy adaptation for achieving more accurate indexing of large numbers of object classes hierarchically. By supporting joint learning of the tree classifier, the deep networks and the visual hierarchy adaptation, our LMM algorithm can provide an effective approach for controlling inter-level error propagation effectively, thus it can achieve better accuracy rates on large-scale visual recognition. Our experiments are carried on ImageNet1K and ImageNet10K image sets, and our LMM algorithm can achieve very competitive results on both the accuracy rates and the computation efficiency as compared with the baseline methods.